Journal of the SEP 23 Wag 
AMERICAN STATISTICAL 
ASSOCIATION 








CONTENTS for SEPTEMBER 1943 
Observations on the Cost of Living Index of the Bureau of Labor Statistics 


Lazare Teper 


Notes on Mr. Teper’s Observations 
Aryness Joy Wickens and Faith M. Williams 


Analysis of Variance for Percentages Based on Unequal Numbers W.G. Cochran 
“Retro” Charts Karl Karsten and Edith Brooks 


A Mechanical Determination of Correlation Coefficients and Standard Deviations 


John R. Platt 
The Evidence for Periodicity in Short Time Series Truman L, Kelley 
Correlation Concepts and the Doolittle Method Dudley J. Cowden 


Overestimation of Mean Squares by the Method of Expected Numbers 
R. E. Comstock 


Utility of Statistical Method in Aerodynamics Herbert G. Smith 
On Measures of Dispersion for a Finite Distribution Albert O. Hirschman 
On Some Census Aidsto Sampling Morris H. Hansen and W. Edwards Deming 
The Social Insurance Movement R. Clyde White 
Horace Secrist, 1881-1943 F. S. Deibler 
Committee on Nominations 


Book Reviews 





VOLUME 38 NUMBER 223 





American Statistical Association 


Organized November 27, 1839 
Incorporated 1841 


The American Statistical Association is a scientific and educational organi- 
zation. Its membership is not confined to professional statisticians but 
includes economists, business executives, research directors, government 
officials, university professors, and other persons who are seriously interested 
in the application of statistical methods to practical problems, in the de- 
velopment of more useful methods, and in the improvement of basic statis- 
tical data. Engineers, mathematicians, biologists, actuaries, sociologists, 
psychologists, and representatives of many other professions are included 
in the membership of the Association. Information about the Association 
and membership application forms may be secured from the Secretary, 
Lester S. Kellogg, 1603 K Street, N. W. Washington 6, D. C. 


War conditions have necessitated the indefinite postponement of the 104th 
Annual Meeting of the American Statistical Association which was to have 
been held in Cleveland, Ohio, December 29 to 31, 1942. 


Chapters of the Association have been organized in important cities in 
the United States and in Havana, Cuba. Information concerning the meet- 
ings of these Chapters may be secured from the District Representatives 


listed on another page. 


The Editors welcome the submission of articles and notes for possible 
publication in the JournaLt. Manuscripts should be sent to the Editor, 
Journal of the American Statistical Association, 1603 K Street, N. W., Wash- 
ington 6, D. C. Books for review should be sent to the Review Editor, Glenn 
E. McLaughlin, 1603 K Street, N. W., Washington 6, D. C. Authors who 
wish suggestions about the preparation of manuscripts and charts or infor- 
mation about editorial policies should address their inquiries to the Editor. 


Subscription rate, $6.00 per annum. Single numbers of volumes 23 to 38 
$1.50 per copy postpaid, except Proceedings Supplements to volumes 23 to 
30, $2.00 per copy postpaid. Prices for sets and discounts for ordersin 


quantity available on request. 








Published Quarterly by the AMERICAN STATISTICAL ASSOCIATION 


Publication Office: 450 Ahnaip Street, Menasha, Wisconsin. Editorial Office: 1603 K. Street, N. W., 
Washington 6, D.C. Acceptance for mailing at special rate of postage provided for in the Act of Febru- 
ary 28, 1925, embodied in paragraph 4, section ome ® . L. & R., authorized March 25, 1936. Entered as 
second class matter at the post office at Menasha, Wisconsin. 











JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


SEPTEMBER, 1943 - $1.50 peR Copy - $6.00 per ANNUM - VoL. 38 - No. 223 








CONTENTS 


ARTICLES 


Observations on the Cost of Living Index of the Bureau of Labor Statistics 
;  _7.ee ts © © 8 LAZzARE TEPER 


Notes on Mr. Teper’s Observations ._. 
7 % Aryness Joy WICKENS AND Farra M. WILLIAMS 


Analysis of Variance for eee Based on Unequal Numbers. . 
c= * « : . W. G. Cocuran 
“Retro” Charts . . . . . .Kart Karsten anv Evita Brooks 


A Mechanical Determination of Correlation Coefficients and Standard 
ae ee ee ee ee ee eee ee Joun R. Puatt 


The Evidence for Periodicity in Short Time Series . Truman L. KELLEY 
Correlation Concepts and the Doolittle Method . . Duptey J. CowpEen 


Overestimation of Mean “ae by the Method of Expected Numbers 
. « ~ es € R. E. Comstock 


Utility of Statistical Method in senite namics . Hersert G. Smita 


On Measures of Dispersion for a Finite Distribution a 
‘ ce ae + ok’ ee & Apert O. HirscHMaN 


On Some Census Aids to Sampling. oe ae ae 
. Morris H. Hansen ann W. Epwarps Demina 


The Social Insurance Movement... . . . RR. Cryps Waite 
Horace Secrist, 1881-1943 . . . . . . . . . F. 8. Dersier 


Committee on Nominations . 


BOOK REVIEWS 
ALTMAN, GEORGE T. Federal Tax Course. James D. Paris 


BarGER, Haroup. Outlay and Income in the United States, 1921-1938. 
Everett E. Hagen 7k ee Ok woke ak Oe oF 


CuasE, Stuart. The Road We Are Traveling. Forrest H. Kirkpatrick . 
CuarRK, CARRIE Patton, see CLARK, FRED E. 


Cuark, FRED E., AND ae, CARRIE Parton. iiemenes of area 
Howard T. Hovde ==: ~ mr 5 or 


Furness, J. W., see Leirn, C. K. 


Gras, N. S. B. Harvard pane Society, Past and Present. Colston 
Warne. i Te i ae ee 


Continued on next page 








REVIEWS—Continued 


HARTKEMEIER, Harry PEuLE. An Introduction to Managerial Business 
Statistics. Donald R. G. Cowan 0 i ae ee ee ee 


Katona, GeorGe. War without Inflation. A. Smithies 


Leirn, C. K., Furness, J. W., ano an CieEoNA. World Minerals and 
World Peace. Wilbert G. Fritz . -e S = we es es oe 


Lewis, CiEeona, see Leitu, C. K. 


Maca.op, Fritz. International Trade and the National Income ay. 
Paul A. Samuelson . (es 6 << « «SS & ‘ 


Micong, Raut C. (Editor) Inter-American Statistical Yearbook, 1942 
Myron E. Andrews . : ‘ 


NrenstTAEpT, L. R. Economic Beuilitrium, Employment and Natural Re- 
sources. Elmer C. Bratt. ae a cs +e se 


Rosison, Soputa M. (Editor) Jewish Population Studies. Georg Wolff 
von Menrina, Orro. The Shifting and Incidence of Taxation. Carl 8S. Shoup. 


PUBLICATIONS RECEIVED. 













382 
367 


369 


374 


371 
381 
372 

















Se Sn gy 


JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


Number 223 SEPTEMBER, 1943 Volume 38 


OBSERVATIONS ON THE COST OF LIVING INDEX OF 
THE BUREAU OF LABOR STATISTICS* 


By Lazare TEPER 
Director of Research, International Ladies’ Garment Workers’ Union 


HE COST OF LIVING index' of the Bureau of Labor Statistics hopes to 

measure “changes from time to time in prices to the ultimate con- 
sumer of goods purchased by a representative group of wage earners and 
lower salaried workers in the larger cities of the country” whose average 
incomes in the 1934-36 period amounted to $1524. Data are collected 
for 34 cities. The cost of living index for each of the cities is a combina- 
tion of index numbers for the following groups representing workers’ 
spending: food; clothing; rent; fuel, electricity and ice; housefurnish- 
ings; and miscellaneous items, which include among others transporta- 
tion, medical care, accident and health insurance, household operation, 
recreation, personal care and gifts and contributions. When prices are 
secured in a particular city for a specific commodity quotations for 
each individual service or commodity are averaged. The Bureau then 


calculates the percentage changes in these average prices from those in the 
previous period, using an identical sample of items and outlets, and applies 
them to the estimated costs of each item in the previous period to obtain 
costs in the current period. A grand total and totals by budgetary groups are 
computed. These totals are related to the totals in the base period to obtain 
the indexes. Under this procedure the Bureau applies price ratio to basic 


Eprror’s Nore. Mr. Teper's article is the first of two or three documents on the Bureau of Labor 
Statistics index of the cost of living which will be run in this and following issues of our JouRNAL. At the 
time of going to press a Committee of the American Statistical Association is actively reviewing the cost 
of living index. This Committee was requested by the Bureau of Labor Statistics and the Secretary of 
Labor. It consists of Professor Frederick C. Mills, of Columbia University, Chairman; E. Wight Bakke, 
Yale University; Samuel S. Stratton, Middlebury College, Vermont; Theodore W. Shultz (Miss Mar- 
garet Reid, alternate), both of Iowa State College; and Reavis Cox, University of Pennsylvania. It is 
hoped that its report will appear in the December issue. In some succeeding issue the Bureau of Labor 
Statistics may wish to publish further statements regarding the index. 

* This paper was presented at the Conference of Research Directors of National and International 
Unions in Washington, D. C., on June 23, 1943. 

The writer wishes to acknowledge the assistance rendered by Nathan Weinberg and Sheila Helmann 
of the ILGWU Research Department in the preparation of the paper. 

1U. S. Bureau of Labor Statistics, Bulletin No. 694: Handbook of Labor Statistics, 1941 edition, I, 
p. 83. 


271 


































272 AMERICAN STATISTICAL ASSOCIATION: 


expenditures by wage earners and lower-salaried clerical workers to secure 

cost aggregates for each pricing period.? 

The “national” cost of living index is derived from the index numbers 
of the individual cities. Each of the component indices is weighted by 
the relative importance of the population of the particular city, aug- 
mented frequently by the population of not too far distant communities 
of 50,000 and more in population. 

Measurement of changes in living costs is important to the wage 
earner and his family at all times. Low income-bracket families feel 
keenly fluctuations in the purchasing power of their wage dollar. In 
periods like the present, when the pressure of rising prices makes itself 
felt on the one hand, and when the ability to secure wage adjustments 
is limited by governmental edict on the other, interest in cost of living 
figures and their accuracy is naturally uppermost in many minds. This 
interest is spurred on by the natural bias of the individual who thinks of 
rising prices in terms of the few commodities which have risen most in 
price. This mental weight given to those commodities which lead the 
price parade frequently obscures the real picture and makes for a not 
too well justified criticism of governmental figures. However, criticisms 
of the present Bureau of Labor Statistics cost of living index are not 
restricted to the biased. Labor economists, business publications, and 
government officials recognize today that the cost of living figures leave 
much to be desired.* In all fairness it should be noted that it does indi- 
cate, even if occasionally and in a weak voice, that “cost of living in- 
dexes have certain limitations which should be remembered by those 
who use them,” as well as that “the index is an approximate measure.”5 

Considerable amount of criticism arises through the fault of the 
Bureau itself. It is much too eager at times to defend its product. Thus 
despite all the difficulties which are inherent in securing black market 
prices, the Bureau is emphatic in stating that its figures “are based on 
actual selling prices, regardless of OPA ceilings.”* Yet, prevailing opin- 

2? Robert A. Sayre, “Cost of Living Indexes,” in this Journat, June 1941, p. 193. See also U.S 


Bureau of Labor Statistics, Bulletin No. 699: Changes in Cost of Living in Large Cities in the United 
States, 1913-41, p. 35. 

* To cite but one example of interdepartmental criticism, we may refer to the U. S. Department of 
Commerce, Survey of Current Business, February 1943, p. 2, which speaks thus of the Canadian as well 
as the United States indexes: “Both the indexes probably understate the true rise in living costs because 
of quality deterioration, illegal price advances, and changes in consumption patterns.” 

4U. S. Bureau of Labor Statistics, Bulletin No. 699, p. 8. 

5 “What is the Cost-of-Living Index?” in Monthly Labor Review, August 1942, p. 273. 

€U. 8S. Bureau of Labor Statistics, The Cost of Living and Retail Costs of Food, March 15, 1943 
(mimeographed release). A somewhat different statement appears in ibid., January 15, 1943: “The 
Bureau of Labor Statistics’ cost of living index reflects actual prices in retail stores where families with 
moderate incomes trade. Black market operations or sales to customers who pay bonuses for service 
cannot, however, be measured.” The writer does not imply that the Bureau does not secure a certain 
number of over-the-ceiling price quotations. 


er 





—_—_— eo WS VS WW Se 


_ fe 


——— le) (é 





oe egy 


a ne 


—~s- 


-Cost or Livinc INDEX OF THE BurREAU OF LABOR STATISTICS 273 


ion does not credit the Bureau’s contention. Professor Irving Fisher, 
discussing recent increases in the cost of living, states that “probably 
the rises were actually greater than these records show, since the rec- 
ords can take no account of the ‘black markets’—the bootlegging of 
goods outside of the regular channels of trade.”’ A business periodical 
honestly remarks: 

These illegal price gains, of course, will not show up in official indexes of the 

cost of living. Many items are priced at the carefully complying big stores, 

and, in any case, the obvious federal agent will hardly be quoted above- 

ceiling prices by even the least sharp-eyed seller.*® 

A government official categorically states that “Black markets and 
the inability to obtain supplies at any price are not, of course, meas- 
ured by price statistics,” even though he notes that “the food stores 
reporting to the Bureau of Labor Statistics actually report their own 
violations in a surprising number of instances”® either through confi- 
dence in the Bureau or because of their ignorance of regulations. It is 
obvious that nothing short of actual purchase by field agents will pro- 
vide that Bureau with information as to the “actual selling prices.” 

Nor does the Bureau, in discussions of its cost of living index, ade- 
quately point out the real nature of the problem involved in pricing 
identical commodities at different pricing periods.’® Stress in the Bu- 
reau’s publications is laid on the fact that with present day techniques 
a “more consistent price reporting is possible.”" This, of course, is an 
accurate observation, but it begs the question. Granting that progress 
has been made, how much further does the Bureau have to go to insure 
adequate comparability of goods to be priced? The existing specifica- 
tions, although more detailed than those of several years ago, continue 
to be couched frequently in vague, suggestive terms, rather than in 
terms which permit comparability controls.“ The statement that 


7 Irving Fisher, “Inflation; Its Chief Causes and Remedies,” in Union Label Feature Service, Release 
no. 31, January 9, 1943, p. 3. 

8 Business Week, December 19, 1942, p. 13. 

* Don D. Humphrey, “The Effectiveness of Price Control,” in U. 8. Department of Commerce, 
Survey of Current Business, February 1943, p. 24. 

10 In many fields competitive rivalry centers largely or exclusively upon non price elements. “The 
nature of the commodity may be such as to make price comparisons difficult and emphasis upon non- 
price factors logical. When this tendency exists, it may be reinforced by the development of conven- 
tional patterns of business practice.” (Temporary National Economic Committee, Monograph No. 1: 
Price Behavior and Business Policy, p. 70). Under such conditions considerable danger exists that the 
continued existence of a particular price line may be construed as no change in price, and that price 
changes will only be recognized when there is a shift to new price lines. To the extent that such approach 
may exist, the indexes for the individual commodities merely establish the fact that a commodity of the 
particular general class can be obtained at a particular price. As such the index fails of course to fulfill 
the theoretical requirements of a cost of living measure. 

1 U. 8. Bureau of Labor Statistics, Bulletin No. 699, p. 6. 

12 See for example ibid., p. 7, for a specification of a men’s medium quality wool suit. As any person 
in the trade could point out, considerable variations in the quality of the suit are possible within the 
framework of this definition. 











274 AMERICAN STATISTICAL ASSOCIATION: 


since 1935, the Retail Price Division of the Bureau of Labor Statistics has 
been using quite precise commodity specifications in pricing at retail, in 
order to be assured that prices at two successive periods refer to the same 
commodities—the same not only in size and appearance, but in quality as 
well!’ 


clearly overstates the case. The memory of the field agents and of the 
retail store personnel" must of necessity constitute the major control as 
to what a particular article was like at the time of the last visit of 
the agent as against the article priced during the current visit.“ No 
experimental controls have been attempted by the Bureau to test the 
degree of possible error in judging comparable qualities. Even if the 
“scientific testing of consumers’ goods for durability and for efficiency 
in performing the services for which they are intended is still in its in- 
fancy,”"* the fact remains that 


Performance tests have been developed by the Federal government in order 
to improve its purchasing procedures, and in order to administer the tariff 
or acts intended to protect the consumer, or to improve trade practices. 
Other performance tests have been worked out by industrial engineers work- 
ing for manufacturers or distributors. In some cases these industry tests 
have been reviewed by the scientific societies concerned and the American 
Standards Association, and have become standard practice. .. . 17 


It would seem that it is a function of the Bureau to develop, either 
on its own behalf, or jointly with the scientific organizations, available 
information on performance. Such information could then be utilized 
first for determining the degree of accuracy in the field agents’ estimates 
of comparability. Once the extent of error is established, this informa- 
tion could be used to introduce a “corrective factor” into the index 
which would account, in a more precise fashion than at the present, for 
the “hidden price changes.”** At present, however, the initiative which 
the Bureau could exercise is dissipated, while the problem of accounting 
for quality changes remains unsolved. 


3 Faith M. Williams, Frances R. Rice and Emil D. Schell, “Cost of Living Indexes in Wartime,” 
in this Journat, December 1942, p. 419. 

4 It is conceded presumably that there exists an “inadequacy of retailers’ knowledge of their own 
merchandise” (Ibid.). 

5 An interesting side light is thrown on this problem by the affidavits prepared by the technical 
personnel of the OPA Standards Division in connection with its investigation of price violations by the 
Hecht Co., Washington, D. C. These searching investigations revealed that, despite the admitted effort 
made by this company to comply with the OPA price ceiling, the store buyers were unable to locate, 
without considerable error, garments of comparable quality to those sold during the base period. (Affi- 
davits of Raymond J. Singer and Robert E. Vanbrunt, in the files of the District Court of the United 
States for the District of Columbia, Civil Action File No. 17995: Leon Henderson v. The Hecht Co.) 

6 Williams, Rice and Schell, supra. p. 420. 

17 Ibid. 

18 To implement such & program, agents of the Bureau must purchase, at different pricing periods, 
a certain number of articles on which they normally obtain quotations. This program could be combined 
with purchases necessary to determine the extent to which black market prices prevail. 



























ee 





mo 


rw i 


se 





OR 5 


EY OR 











-Cost or Livine INDEX OF THE BUREAU OF LABOR STATISTICS 275 


A certain amount of misunderstanding arises from the fact that, by 
and large, except for the description of the general make-up of the index, 
little is known, even among those interested in the subject, about the 
actual handling of the data once it is collected. A case in question is 
well illustrated by the Bureau’s study of its new index, published as its 
Bulletin No. 699. The advent of this study was made known in a previ- 
ously published article in the Monthly Labor Review which stated: 

A detailed consideration of all the steps involved in computing the new 

index is outside the scope of the present article. A bulletin giving, in addition 

to a detailed account (my italics—L. T.) of the various procedures employed, 

a list of the weights for each city is now in preparation.'® 


This promise however was but partly kept. Bulletin No. 699 does 
give the population weights used for combining index numbers for the 
individual cities into composite indexes for the United States. But it 
does not give a “detailed account” of all the procedures employed. For 
one thing the serious problem of securing price quotations is treated 
most generally.*° Yet it is a neglected subject in the literature of sta- 
tistics," certainly if compared to the amount of space in statistical liter- 
ature devoted to the lesser question of the index number formulae.” 
One could certainly wish the Bureau would devote considerably more 
space to the public discussion of this problem. Free and easy access 
to specifications used by the Bureau in collecting retail prices is not 
enough. Instructions issued to field agents (other than those which are 
purely administrative in character) should be regularly released. All 
changes in specifications which find their way into the monthly tabula- 
tions should be made known. Special problems encountered in the 
course of collecting data should be given. Similarly it would be desirable 
for the Bureau to release regularly ‘ts instructions to the personnel 
entrusted with editing and tabulatin, information secured in the field. 
It is hardly satisfactory to be told in general terms that some adjust- 
ments were made in the index.” 


19 “U. S. Bureau of Labor Statistics’ New Index of Cost of Living,” in Monthly Labor Review, 
August 1940, p. 369. 

20 The entire discussion of this subject is compressed into less than 3 pages. 

21 This does not mean that the subject received no attention whatsoever. Wesley C. Mitchell's 
pioneer work, “The Making and Using of Index Numbers” devotes several pages to the topic (U. 8. 
Bureau of Labor Statistics, Bulletin No. 284, Index Numbers of Wholesale Prices in the United States and 
Foreign Countries, pp. 25 ff.). The subject was forcefully brought to the fore by Willard L. Thorp (see his 
“Price Theories and Market Realities” in American Economic Review, March 1936, supplement, pp. 
14 ff.). An interesting discussion of the problem is also found in the U. 8. National Resources Committee, 
The Structure of the Americcn Economy, I, 173 ff. See also Williams, Rice and Schell, supra. 

2 “The discussion concerning index number technique has turned mainly on the selection of data 
and the algebraic characteristics of various formulas, with insufficient attention being given to the 
fundamental nature of the phenomena dealt with.” (Wirth F. Ferger, “Distinctive Concepts of Price 
and Purchasing-power Index Numbers,” in this Journna., June 1936, p. 258). 

® An example of such generalities may be found in the U. 8. Bureau of Labor Statistics, “Descrip- 































AMERICAN STATISTICAL ASSOCIATION: 





276 


If such information were generally available it could go a long way 
toward clearing up misunderstandings between the Bureau and the 
users of its figures. Such information could readily be incorporated into 
special technical bulletins, to be released at monthly intervals, to inter- 
ested persons and groups. These technical bulletins could also serve 
another function—a discussion of the theoretical problems concerning 
the significance of the index, and their reconciliation with the practical 
limitations in the index construction. In theory the cost or price of liv- 
ing may be defined as “the ratio between incomes securing the same 
standard of living, or, in other words, procuring the same sum total of 
economic satisfaction, in two situations which differ only in respect of 
prices.” Even if the above relationship holds true between two con- 
secutive pricing periods, what is the significance of the index when 
radical changes in the actual level of living which is priced take place 
over a period of time? It is usually agreed that when “the links are 
chained together, precise comparability is lost, and the meaning of the 
index cannot be stated in simple language.”* What biases are intro- 
duced by chaining link relatives to a single index? Warren Persons notes 
that 

It has frequently been observed, in actual practice, not only that chain 

indexes with changing weights differ from corresponding fixed base indexes, 

but also that the divergence of the two indexes tends to persist and even to 
become irregularly greater and greater as the number of links in the chain 
increases.”6 


The problem has been recognized by the Bureau of Labor Statistics 
in connection with its index of wholesale prices,”’ and also in connection 
with the data on employment and payrolls.?® What recognition has 
been given to this factor in connection with the cost of living index? 





tion of Changes in the B.L.S. Cost of Living Index as of March 15, 1943” (mimeographed release). 
This release states that “adjustments were made for changes in the volume of food sold through chains 
and independent stores and supermarkets.” What was the nature of these adjustments? On what in- 
formation did the Bureau rely in making changes? What provisions if any were made for any future 
shifts as between retail outlets? Similarly it is important to know when certain cases of substitutions 
are introduced not by a linking method but as price changes (U. 8. Bureau of Labor Statistics, Bulletin 
No. 699, p. 7). 

% International Labour Office, Studies and Reports, Series N (Statistics), No. 20, International 
Comparisons of Cost of Living: A Study of Certain Problems Connected with the Making of Index Numbers 
of Food Costs and of Rents, p. 74. 

% Frederick E Croxton and Dudley J. Cowden, Applied General Statistics, p. 621. 

% Warren Milton Persons, The Construction of Index Numbers, p. 65. 

27 Jesse M. Cutts and Samuel J. Dennis, “Revised Method of Calculation of the Wholesale Price 
Index of the United States Bureau of Labor Statistics,” in this JournnaL, December 1937, pp. 665 f. 

28 “The weakness of chain indexes is the possibility of errors of a progressive and cumulative char- 
acter. With a chain type of index an error in calculating the percentage of change in any one month not 
only affects the index for that particular month but is carried to all subsequent items of the series.” 
(U.S. Bureau of Labor Statistics Bulletin No. 610: Revised Indexes of Factory Employment and Pay Rolls, 
1919 to 1933, p. 11.) 








ll 





SES om 








-Cost or Livine INDEX OF THE BUREAU OF LABOR STATISTICS 277 


The Bureau admittedly is making a number of adjustments in the index 
which affect its make-up.*® Quantity weights are changed to allow for 
changes in the supply situation. To the extent that the needs of war 
economy bring about a deterioration in living standards,*° budgets now 
priced by the Bureau (even if we ignore subtitutions for those goods 
and services which are no longer available) are representative of a lower 
level of living. Do we on that basis obtain valid comparisons of the 
changes in the cost of living with the pre-war period? Do the substitu- 
tions of commodities made today affect the index to the same extent 
as substitutions in a period of normalcy, when changes in the types of 
available goods and in consumption habits are a slow and graduai 
process? Should the Bureau change consumption weights for those 
items that continue to be available in civilian markets?™ Should not 
the Bureau preserve, to the extent that it is possible, the level of con- 
sumption priced in pre-war days and allow in the budgets which it 
prices equivalent expenditures (at the time substitutions are made) in 
terms of commodities and services which replace the ones no longer 
obtainable?* On the other hand, should a different view be taken, can 
the Bureau contend that its index measures only changes in the cost of 
living and not of the plane of living as well? 

The chaotic situation in retail markets during war time brings an- 
other problem to the fore. Are individual price samples collected by the 
Bureau reliable indicators of price movements in particular cities? 
When it comes to commodities and services other than food or rents, 
the Bureau is satisfied as a rule to secure four quotations, except in the 
city of New York where five quotations are secured.** This is a fact 
not well enough known. 

2° Williams, Rice and Schell, supra; U. 8. Bureau of Labor Statistics, Description of Changes in the 
B.L.S. Cost of Living Index as of March 16, 1943. Also “Bureau of Labor Statistics Cost of Living Index 
in Wartime,” Monthly Labor Review, July, 1943. 

%0 “As the production machinery of the country is converted to war purposes, the goods and serv- 
ices produced for consumers are undergoing more drastic changes in the course of a single year than 


occurred in the entire period from 1919 to 1935—but in the opposite direction.” (Williams, Rice and 
Schell, supra, p. 416.) 

31 A question well may be asked regarding the treatment of individual price series at the time the 
cheaper grades of a particular commodity disappear under the stress of war economy and only more 
expensive grades remain—how are such situations to be treated? It is the practice of the Bureau to treat 
such situations not by linking them with the older series but as price changes. (U. 8. Bureau of Labor 
Statistics Bulletin No. 699, p. 7.) This method is not inconsistent with the point made in the text which 
applies primarily to situations when goods which disappeared are replaced with quotations of com- 
modities in different classifications. Forced upgrading of the consumer is definitely in the same class with 
situations when under the stress of necessity the worker is forced to trade in more expensive stores. 

32 A somewhat similar principle has been recognized in the construction of the index of wholesale 
prices. When a particular series is replaced with another, an adjustment of the “quantity multiplier to 
take account of the substitution” is “made on the basis of the typical relationship between the two se- 
ries” (Cutts and Dennis, supra, p. 669). 

% U.S. Bureau of Labor Statistics, Retail Prices: Instructions for Collecting Retail Prices, Commodi- 
ties and Services (Excluding Rents), September 1940 (mimeographed), paragraph 138, p. 16, gives the 
following instructions to field agents: “Secure four quotations for each item on the schedules in every 

















278 AMERICAN STATISTICAL ASSOCIATION: 





It is questionable whether a sampling of 4 or 5 similar (or identical) 
commodities yields a representative value. Let us consider an example. 
According to the Census of Distribution for 1939, dresses were retailed 
by 567 stores in New York City divided as follows:* 


Number of stores 





Type of store reporting sales Dress dollar volume 

of dresses of reporting stores 
Department stores 25 $11,373 ,000 
Family clothing stores 35 $ 8,959,000 
Women’s ready to wear stores 507 $58 , 217 ,000 


The above figures do not represent all stores selling dresses in the city 
of New York as the 1939 census records the existence of 2818 stores (in 
the above listed classifications) doing a total dollar volume of $502,- 
099,000 as against 632 stores doing a total volume of $435,165,000 
which have reported product breakdowns. While it is undoubtedly 
true that not all of these stores catered to a clientele of wage earners 
and lower salaried workers, it is nevertheless obvious that quotations 
from 5 stores are hardly representative of prices in the New York re- 
tail dress market. The sample is so small that it is doubtful whether 
any test of significance could be devised which could be applied to the 





city except New York where five are to be secured.” For list of commodities to which these instructions 
apply see ibid. pp. 18 f. Undoubtedly in some instances more than one article in the same general class 
may be pried. “For a large proportion of the items included in the index more than 1 quality is priced; 
in the case of the more important items, as many as 4 in a given city.” (U. 8. Bureau of Labor Statistics 
Bulletin No. 699, p. 15.) 

* U.S. Bureau of Labor Statistics Bulletin No. 699, although presumably a complete discussion of 
the index and of its construction does not mention it. It does however mention the number of outlets 
reporting retail food prices for June 1941 (ibid., p. 88). According to another Bureau's publication, the 
number of firms reporting food prices varies from 10 in the smallest cities to 50 in the largest “roughly 
proportional to the population (U.S. Bureau of Labor Statistics, Bulletin No. 635: Retail Prices of Food, 
1923-36, p. 10). Nor does this study indicate the size of the rent sample other than stating that “Advan- 
tage was taken of the Real Property Inventory and of local studies of housing to secure a sample which 
would be representative of housing conditions in the cities covered.” (U. S. Bureau of Labor Statistics, 
Bulletin No. 699, p. 20.) 

Miss Faith M. Williams writes on July 16, 1943, as follows: “The number of outlets from which prices 
are obtained in each city was inadvertently omitted from Bulletin 699. It had been published previously, 
however, in the August, 1936, February, 1938, and November, 1938, pamphlets on Retail Prices and in 
a mimeographed publication on retail prices of hosiery by areas (June, 1940). In response to requests 
for average prices of commodities other than food, the Bureau always explains that its reason for not 
releasing such figures is that only four outlets are covered in each city except New York where five 
are covered. In considering the number of outlets covered for each commodity priced, one must take 
account of the fact that for the clothing and housefurnishings included in the index, more than one 
quality and for some goods as many as four qualities are priced in each outlet. There is undoubtedly 
an error of estimate in average price changes based on four quotations, but there is no reason to assume 
that there is a bias in the errors of estimate. The clothing index for each city for each date is based on 
421 quotations; the housefurnishings index on 129 quotations. The Bureau does publish per cent changes 
in costs of individual items for all cities combined and each one of these is based on 137 price quotations 
foreach date.” Miss Williams’ assumption that there is no bias in the errors of estimate, i.e., that the er- 
rors of estimate remain constant over a period of time, is unduly optimistic. 
% U.S. Census Bureau, 1940 Census of Business, I (2), 234, 310, 312. 



















~_ ™ 


= = Ff 








OT 











-Cost or Livine INDEX OF THE BUREAU OF LABOR STATISTICS 279 


data so collected. The presence of pronounced dispersion in the data so 
collected may further weaken the validity of the average quotation. 
It is interesting to note in this connection that instructions to the 
Bureau’s field agents make the following comment on the presence of 
dispersion in the price quotations secured in a particular city: 

Dispersions. Prices on an item conforming to the same specification will 

usually show some dispersion. However a range of prices which exceeds 50 

per cent from lowest to highest is considered unusual and must be explained 

on the tally.* 

It is therefore imperative for the Bureau to increase the size of the 
samples of many of the commodities it prices. For only in such a man- 
ner can it hope to eradicate the errors of inadequate sampling. 

Another observation pertaining to the securing of price quotations 
made be made at this point. The Bureau customarily obtains regular 
prices from its contacts in the retail field. Sale prices, i.e., markdown 
prices, according to the Bureau’s instructions to its field agents may 
be taken under the following conditions: 

The period of sale must be two weeks or longer and must include the date 
of the pricing period. 


The sale must not be one for the purpose of discontinuing the item per- 
manently. This does not refer to seasonal clearance sales.*” 


Similar instructions are issued in connection with the collection of retail 
prices of food: 
Only the “regular” price is to be obtained for individual items. This excludes 


clearance, distress and bankruptcy prices, and also special offers. “Special 
sale” prices may be accepted where the “special” prevails for several days.** 


In normal times the wage earner and his family frequently resort to 
buying “specials” at sales. To the extent that sales, whether of one 
day’s duration or longer, continue from year to year and are respon- 
sible for an approximately constant amount of the consumer dollar, 
their incomplete recognition by the cost of living index would not ma- 
terially influence its trend. However, in times like the present, the rela- 
tive importance of special sales has materially declined. While the 1942 
figures have not been released at the time of this writing by the Na- 
tional Retail Dry Goods Association, an analysis of the relative im- 
portance of markdown in department stores between 1935 and 1941 
shows definitely that they are on a decline. 

%* U. S. Bureau of Labor Statistics, Retail Prices: Instructions for Collecting Retail Prices, Commodi- 
ties and Services (Excluding Rents), September 1940, paragraph 155, p. 17. 
37 Jbid., paragraphs 141 and 142, p. 15. 


38 U. S. Bureau of Labor Statistics, Retail Prices: Instructions for Collecting Retail Prices, Food, 
March 1942 (mimeographed), paragraph 18, p. 2. 











AMERICAN STATISTICAL ASSOCIATION: 


Markdowns*® as Per Cent of Retail Sales, Department Stores with 
Annual Sales volume over $500,000, 1935-1941 
MARKDOWN PERCENTAGE 








Year 
Main Store Basement 

1935 7a 6.9 
1936 6.6 6.3 
1937 7.0 6.5 
1938 7.3 6.6 
1939 6.9 6.2 
1940 6.5 6.1 
1941 5.4 5.3 


When the volume of special sales undergoes a significant variation, 
it is desirable to record their influence in the cost of living index. While 
it is manifestly impossible to secure sales quotations, particularly if 
sales are prone to be of short duration, and while it is difficult to obtain 
the relative volume of business done at regular prices as distinguished 
from the sales prices, it may be possible to work out a series of correc- 
tion factors which, though not obtainable on a current basis, could 
nevertheless be periodically incorporated in the index for the purpose of 
correction of price data. If the relative importance of markdowns is 
known for a class of commodities, it may be used for the purpose of 
constructing such an adjustment factor.*° 

An adjustment factor is probably needed in connection with the 
changes in the Bureau’s practice in collecting data on rents. In the 

past rental quotations were obtained from rental agencies. However, 
beginning with September 1942, rent quotations in 13 cities were ob- 
tained from tenants instead.*! (This practice has now been extended 
to all cities.) If discrepancies were found between quotations secured 
from rental agencies and those of tenants as a result of war conditions, 
allowance should have been introduced in the cost of living index. 

The cost of living index of the Bureau of Labor Statistics is sup- 
posed to measure the cost of goods and services purchased by wage 

39 National Retail Dry Goods Association, Controllers’ Congress, Departmental Merchandising 
and Operating Results of Department Stores and Specialty Stores (for the years 1935 through 1941). 

40 If in a particular year the percentage of mark-downs to the volume of sales (S) amounted to m, 
then, for each of the years which we wish to study S = pq, where p is the average price actually received 
by the merchant for his merchandise, and g is the quantity sold. If there were no “special” sales respon- 
sible for markdowns then the “value” of sales (V) would have been V =S(1 +m) =q[p(1 +m) ]. If it is 
the practice of field agents to get the quotations of “regular” prices, i.e. p(1+7m), then the actual aver- 
age prices obtained from the store’s customers can be computed, when m is known. 

41 Williams, Rice and Schell, supra, p. 421. 

«2 Miss Faith M. Williams writes on July 16, 1943, as follows: “Discrepancies were found between 
percentage changes shown by quotations obtained from rental management agencies and in the rents as 


reported in the tenant survey in four cities. The following figures show the indexes as originally published 
and as revised. Differences between the averages from the rental management surveys and the tenant 
















eatuy-: 


oe. a 


SE ET 








-Cost or Livinc INDEX OF THE BUREAU OF LABOR STATISTICS 281 


earners and lower-salaried workers. Yet one of the services paid for 
by them is but partially represented in the index—the cost of maintain- 
ing the services and functions of government. Sales taxes are added to 
the cost of the commodities on which they are imposed. Automobile 
and other taxes are specifically included. Property taxes are indirectly 
reflected in rentals. However, income taxes and social security taxes 
are not included in the index. 

The reasoning for the exclusion of income taxes is not altogether 
clear. The Handbook of Labor Statistics suggests that “income taxes 
paid have . . . been omitted as they have heretofore applied to a very 
small proportion of the groups whose living costs the indexes attempt to 
measure.”* A recent article by three officials of the Bureau who are 
concerned with the compilation of the index, however, rationalized the 
exclusion on the ground that “income taxes . . . are considered as de- 
ductions from income and not as current expenditure.” The Bureau’s 
bulletin intended as a reference book for persons using the cost of living 
indexes merely states, “Income taxes paid have also been omitted.” 

Prior to the present war, for a number of years, income taxes in the 
low income brackets were insignificant in amount, if any tax was paid 
at all. Furthermore there was little variation in the tax rates. They 
could therefore be disregarded in the computation of cost of living 
indexes, as they would not have affected the course of the index in the 
slightest degree. Since those days, however, the tax picture has under- 
gone a radical change. Even persons in the lowest income brackets must 
pay something toward governmental services. Little justification exists 





. . . °° ” 
surveys were not significant in the other cities covered. 


(1935-39 = 100) Rental manage- Tenant 
ment survey survey 
Cleveland 
June 1942 .0 115.7 
July 1942 .0 115.5 
August 1942 112.1 115.3 
Detroit 
June 1942 112.5 115.1 
July 1942 111.0 114.9 
August 1942 110.8 114.7 
Manchester 
September 1942 107.8 107.6 
Portland, Oregon 
September 1942 115.1 115.6 


“ U. 8. Bureau of Labor Statistics, Bulletin No. 694, I, p. 84. 
“ Williams, Rice and Schell, supra, p. 422. 
© U. S. Bureau of Labor Statistics Bulletin No. 699, p. 8. 



























282 AMERICAN STATISTICAL ASSOCIATION: 


under such conditions for treating income taxes differently from direct 
taxes used for financing similar or identical services. 

The inclusion of income taxes in the cost of living index is not a 
new idea. The Swedish cost of living index includes income taxation as 
one of its components.“ The International Labour Office suggested 
definite procedures on the inclusion of income taxes in the index.*’ 
Representatives of the Bureau of Labor Statistics agree that “the 
amount of the tax to moderate-income families can, of course, be read- 
ily computed.”4* Little reason therefore remains for continuing to ex- 
clude the cost of income taxation from the cost of living index. 

Social security taxes present a somewhat different problem. Their 
exclusion from the index is justified on the ground that they “have been 
treated as savings.”*® However, social security taxes represent savings 
only in part. 

It is interesting in connection with this point, to examine the reason- 
ing of the Bureau of Labor Statistics justifying the exclusion of life- 
insurance premiums from the computation of the cost of living figures 
and for treating them as savings: 


It is recognized that most insurance-policy premiums include payments for 
several elements, only one of which is truly savings. The first is for actual 
cost of life-insurance protection during the year in question. This would 
amount to the cost on an actuarial basis of term insurance for 1 year at the 
actual age of the insured. Such cost is properly current family expenditure 
for insurance protection for the year. Another element is the part of the 
premium which goes toward operating costs of the insurance company. This 
element is especially large in the case of industrial insurance, which covers 
the expense of making weekly collections. This element is also not properly 
savings, but merely a form of current family expenditure. Any amounts 
included in the premium payments in excess of these two items, which ac- 
cumulate in the form of net cashable value of the policy, are truly savings. 
To the extent that policies are allowed to lapse under terms which mean loss 
of payments previously made, even such payments can only doubtfully be 
classed as savings....It is... impossible to estimate how much of the 
amount paid in life-insurance premiums represents savings and how much 
was paid for insurance protection during the year or other services of the 
insurance company. The entire amount of such payments has therefore been 
treated as a disposition of funds for items other than current family expendi- 
ture, an increase in assets, and hence as savings... °° 


# Williams, Rice and Schell, supra, p. 422. 

47 International Labour Office, Studies and Reports, Series N (Statistics), No. 6: Methods of Com- 
piling Cost of Living Index Numbers, Report prepared for the Second International Conference of Labour 
Statisticians, pp. 45 ff. 

48 Williams, Rice and Schell, supra, p. 422. 

4° U. S. Bureau of Labor Statistics Bulletin No. 699, p. 8. 

5° U. S. Bureau of Labor Statistics Bulletin No. 638: Money Disbursements of Wage Earners and 
Clerical Workers, 1934-36, Summary Volume by Faith M. Williams and Alice C. Hanson, p. 179. 








OS 


-——_— 


-Cost or Livinec INDEX OF THE BUREAU OF LABOR STATISTICS 283 


It is evident from the above that the only reason for the non-inclu- 
sion of the life-insurance premiums is a technical one: inability to sepa- 
rate those elements in life-insurance premiums which represent the 
current, actual cost of insurance to the policy-holder from those which 
represent savings. Were such a separation possible an allowance would 
have been made. 

It is obvious that this logic holds true for other forms of insurance as 
well. Thus we find that the Bureau’s consumer expenditure studies 
treat automobile accident and health insurance as expenses.®! These 
items are incorporated in the cost of living index.™ 

The analogy between the treatment of insurance taken out with pri- 
vate carriers and social security contributions is perfect. Unemploy- 
ment insurance taxes, in those states where the worker contributes, are 
clearly comparable with term insurance. The payment of taxes buys for 
the worker certain limited insurance protection. There is no paid up 
value. Unless he collects benefits within a benefit year, he has no ac- 
crued rights. It thus becomes a form of current family expenditure. 

On the other hand it can hardly be argued that old age and survivor 
benefits come within the same class. While there is no paid in value, 
these contributions definitely can be characterized as savings. 

Two omissions from the index may be mentioned in brief. One of 
these is represented by restaurant prices. The Bureau had planned to 
incorporate meals eaten away from home into the index during the fiscal 
year 1942.°° To date this reform has not been carried out. The other 
omission is the recognition that wage-earners who own their own homes 
have a somewhat different problem from their rent-paying neighbors. 
With the general rise in prices and changes in tax rates the cost of home- 
ownership may follow different trends from rentals. The problem cer- 
tainly deserves investigation. 

In order to secure a national index of the cost of living the Bureau 
weighs indexes for the individual cities by the population of the given 
metropolitan area and that of other cities in the same region and size 
class. To what extent does an index for a specific city represent the 
cost of living changes of a larger area comprised of several towns? A 
check with special studies of cost of living changes made by the Bureau 
of Labor Statistics shows that it is not always representative. Compar- 
ing, for example, the cities of Boston and Bridgeport we find that be- 

8! Ibid., pp. 138 ff. and pp. 158 f. 

5: U. S. Bureau of Labor Statistics Bulletin No. 699, p. 23. 

83 U. S. Bureau of Labor Statistics Bulletin No. 699, p. 16. 

% Ibid., p. 39. Recently the Bureau modernized its city weights to allow for wartime changes in 


population (U. 8. Bureau of Labor Statistics, Description of Changes in the B.L.S. Cost of Living Index 
as of March 165, 1948). 











284 AMERICAN STATISTICAL ASSOCIATION: 


tween October, 1939, and October, 1942, the following changes in the 
cost of living took place: 


Boston® Bridgeport 

(Per cent) (Per cent) 
All items 19.3 22.1 
Food 34.7 33.9 
Clothing 22.7 29.6 
Rent 4.9 6.0 
Fuel, electricity and ice 24.1 13.5 
House furnishing 18.7 22.3 
Miscellaneous 10.0 15.8 


While a single example does not make the rule, it indicates the de- 
sirability of experimentation and the inclusion of a larger number of 
cities, at least periodically, for the purpose of checking possible devia- 
tions of the national index. 


NOTES ON MR. TEPER’S OBSERVATIONS 


By Aryness Joy WICKENS 
Chief, Prices and Cost of Living Branch 
AND Fairxu M. WILLIAMS 
Chief, Cost of Living Division, Bureau of Labor Statistics 


of the Bureau of Labor Statistics” expresses ably some of the 
doubts existing in a number of quarters concerning the accuracy of the 
index, and raises a number of important technical questions. Most of 
these will be discussed by the forthcoming report of the Committee of 
the American Statistical Association. 

The Bureau of Labor Statistics would prefer to let the report of the 
Committee speak for itself on most of these points. There are, however, 
a number of points which are worthy of note at this time. 

(a) The Bureau of Labor Statistics has not attempted to obtain 
prices charged on “black markets” outside the regular channels of 
trade. The index does, however, contain many prices which are above 
Office of Price Administration ceiling levels. For certain cuts of meat, 
on a recent occasion just after dollars and cents ceilings had been 
established, 25 per cent of the prices »eported were above ceilings; for 
rents, a recent comparison shows about 4 per cent above ceilings in 
certain cities. Moreover, there have been repeated instances of re- 
ported prices which increased month after month in the same store, 


MM: TEPER’s article on “Observations on the Cost of Living Index 


% October, 1939, figures (other than food) for Boston were estimated by interpolating between 
September and December, 1939, indexes. 














PO 2 es 


ee 


-Nores oN Mr. TEPeR’s OBSERVATIONS 285 


although they were under the General Maximum Price Regulation and 
should not have advanced. 

It is not possible to say what percentage of the total number of 
quotations in the index is above ceiling levels because the great variety 
of ceiling regulations in different areas and for different products makes 
comparisons difficult. 

Nor is it contended by the Bureau of Labor Statistics that all across- 
the-counter prices above ceilings (as distinguished from bootleg or 
“black market” sales) are reported. Certain special tests made for the 
use of the American Statistical Association Committee indicate, how- 
ever, a close correspondence of actual sales prices charged customers 
with prices given to the representatives of the Bureau. 


(b) Recently, fewer stores have held “white sales,” and the price 
declines have been less than in the pre-war period. This is also reflected 
in the cost of living index. No figures are available to indicate the ex- 
tent to which wage earners and lower salaried groups have in the past 
made use of special sales, or the extent to which they are using them 
now. It is therefore impossible for the Bureau to devise correction fac- 
tors for the cost of living index which would account for the wartime 
reduction in the number of special sales. Where the whole market goes 
down for an extended period of a month or more, as in the case of the 
“white sales,” and the practice is accepted throughout the trade, these 
lower prices are reflected in the Bureau’s monthly index, but without 
added weight. They are treated, in effect, like any other seasonal price 
decline. 

(c) With regard to measuring changes in the serviceability of con- 
sumer goods at different price levels by laboratory tests, the Bureau has 
long been interested in the problem of quality change, but has found 
no satisfactory way to deal with it. Mr. Teper quotes part of a para- 
graph from an article which appeared in this JourNaL in December 
1942. That paragraph went on to say that the Bureau had explored at 
some length in the summer and fall of that year the available test ma- 
terial on changes in the serviceability of consumer goods due to the 
use of substitute materials necessitated by the war situation, and had 
found no summary data which could be applied as quantitative factors 
to take statistical account of hidden price changes. Mr. Teper suggests 
that the Bureau shouid utilize available material on the performance of 
consumers’ goods as a corrective factor in the cost of living index. Such 
a procedure would not only be exceedingly difficult, but would require a 
large appropriation for the purchase and testing of the goods priced 
at different periods, and extensive research as to methods of summariz- 














286 AMERICAN STATISTICAL ASSOCIATION - 





ing measures of durability, serviceability, etc., as for example, meas- 
ures of tensile strength, resistance to abrasion, air permeability, and 
heat conductivity for overcoating. It is very doubtful, in our opinion, 
if this proposal is practicable. 

In evaluating the desirability of carrying out Mr. Teper’s suggestion, 
even if it were practicable to do so over a long period of time, it is im- 
portant to realize that in periods of technical development when the 
quality of consumers’ goods has been improving the Bureau has faced a 
similar problem of quality change, but in the other direction. In the 
pre-war period the quality of the refrigerators, automobiles and ra- 
dios included in the index was improving but there was no way to 
measure the extent of the improvement of the models priced from year 
to year, and to that extent the index has had an upward bias. 

(d) Mr. Teper raises the question as to whether, with wartime 
changes, the cost of living index measures currently changes in the cost 
of living or in the plane of living as well. When new automobiles and 
other durable goods were dropped from the index, beginning in Janu- 
ary 1942, no substitutions were made except for increases in local 
transportation and automobile repairs, and the durable goods were 
“linked out.” The level of the index did not decline because of the dele- 
tion of these articles. Mr. Teper suggests that the Bureau should pre- 
serve the level of consumption priced in pre-war days and allow equiv- 
alent commodities and services to replace those not obtainable. This is 
a difficult question, for the goods are not available; there is nothing 
equivalent to replace them. Thus, currently the Bureau has sought to 
account for price changes in goods actually available on the market. 

We have frequently urged that the cost of living index should be ac- 
companied by figures on changes in family expenditures during the war 
period. Funds have not been made available for such studies, and the 
Bureau has thus been unable to furnish any measure of changes in the 
plane of living since March 1942. 

(e) Finally, the Bureau agrees that Mr. Teper’s point is well taken 
that it is desirable to provide both the general public and technicians 
with more information about the way in which the cost of living index 
is constructed and how it can be used. Mr. Teper has been assured that, 
with added funds just made available for more extended work on prices 
and the cost of living, the Bureau will make every effort adequately to 
publicize the cost of living index. 

















ANALYSIS OF VARIANCE FOR PERCENTAGES BASED ON 
UNEQUAL NUMBERS* 


By W. G. Cocuran 
Iowa State College 


EVERAL WRITERS have discussed the use of the analysis of variance 
~ when the data are expressed as fractions or percentages. In the case 
where all percentages are based on the same total count (i.e., the same 
demoninator), the techniques which have been developed appear to be 
adequate for most practical purposes. According to the nature of the 
variations present in the data, these techniques take the form of (i) a 
straightforward analysis of variance of the percentages, (ii) an analysis 
of variance of angles into which the percentages have been transformed, 
[Bliss (1938), Cochran (1938), Clark and Leonard (1939)] or (iii) the 
angular transformation with some further refinements [Bartlett (1936) 
(1937), Cochran (1940)]. 

The case in which the percentages are based on different total num- 
bers is more troublesome. As will appear later, there is a greater 
variety of possible methods of analysis, so that much time may be con- 
sumed in trying to decide what is a suitable method. Further, with 
certain types of data the efficient methods are rather tedious. The object 
of this discussion is to suggest approximate preliminary tests which are 
helpful in the choice of a method of analysis that is reasonably efficient 
and not unnecessarily laborious. 


NATURE OF THE VARIATION IN PERCENTAGE DATA 


For convenience in this discussion fractions will be used instead of 
percentages. Let n; be the total count on which an observed fraction 
or percentage is based, and let f;=a;/n; be the observed fraction. If p; 
is the true fraction for this observation, the variance of f; about p; is 
given by the usual expression for the binomial distribution: p.qi/ni, 
where gi=(1—pi). 

In addition, the true fraction p; will frequently be found to vary from 
observation to observation. Where present, such variation may con- 
tribute to the experimental error on which z, F, or t-tests are made. 
Thus for the total variance of an observed fraction we may write 

Vif) = + a2 (1) 


* Journal paper No. J-1115 of the Iowa Agricultural Experiment Station, Ames, Iowa. Project 
No. 514. 


287 

















AMERICAN STATISTICAL ASSOCIATION: 





288 


where the first term represents the binomial variation and the second 
term the extraneous variation. 

It is not clear what assumptions may best be made about o,*. Prob- 
ably no single set of assumptions is appropriate for all types of data. 
Since, however, o,;? represents the variance of the true fraction, o,? will 
not in general depend on n;. It may depend on p;, for more variation 
would be expected when the fractions are near } than when they ap- 
proach 0 or 1. 

Subject to the assumption, implicit in all uses of the analysis of 
variance with fractions, that the fractions are approximately normally 
distributed, equation (1) suggests that the observations should be 
weighted, where the true weight W;, is 


] N; 


W; - (2) 





Didi P pag + noi? 
+ 


o;" 





ni 

When the extraneous variation is small, the weight assigned to any 
fraction is n;/p,q;; i.e., it increases directly as the total n; from which 
the fraction is derived. At the other extreme, when the values of ¢,? are 
large relative to p.qi, the weight of an observation is 1/¢,’, i.e., inde- 
pendent of nj. 

The practical procedure which follows from this discussion is to esti- 
mate first the relative amounts of binomial and extraneous variation in 
the data and the nature of the extraneous variation. From this informa- 
tion an appropriate set of weights can be constructed for the subsequent 
analysis. But unless the data are extensive or unless a priori knowledge 
is available, it may be expected that the information for the calculation 
of weights will not be very accurate. Moreover the computation of 
weights is time-consuming. For these reasons it is worthwhile to con- 
sider the adequacy of some simple methods of weighting. 


THE EFFICIENCIES OF BINOMIAL AND EQUAL WEIGHTING 


Two methods to be investigated are (i) weighting proportional to the 
n; (ii) weighting independent of the n;. As we have seen, these methods 
are efficient when the ratio of the binomial to the total variation is 1 and 
0 respectively. 

For simplicity we will consider the estimation of the mean of a group 
of s observations, such as the mean for one treatment over a number of 
replications. If weights w; are used where the true weights are W;, the 
variance of the weighted mean is known to be 


Er )/LE | @ 

















ee 


tate oe 
~~ 


ge 


- VARIANCE FOR PERCENTAGES BASED ON UNEQUAL NUMBERS 289 


Since the variance of a correctly-weighted mean is 1/2*;.1(W;), the 
efficiency of the actual mean is 


z~]/LE"4(EG0} © 


i=l 





In order to obtain data which can be visualized, a number of sets of 
values of n; were chosen, in some of which the n; were approximately 
equal while in others they varied greatly. It was assumed that (p,9;) 
and g;* were constant for all observations. This assumption is not con- 
sidered unduly restrictive since the mean of a single treatment is being 


EFFICIENCIES OF BINOMIAL, EQUAL AND PARTIAL WEIGHTS 






































lOO ™ r ’ 
ase nt. Binomial 
90 a aaa 
a ~ 
g 
80 re - ax 
) ~ - N 
iw \ 
c Lat \ | Partial 
e a , 
a5 N 
> \ 
° 40 \ 
v 
2 30 \ 
_ Equal — 
LJ 
20 
10 
O 






































O i0 20 30 40 50 60 70 80 90 lOO 
Percentage of Binomial Variation 


considered. The assumption might fail if, for instance, a treatment in 
a randomized blocks experiment gave 95 per cent in some blocks and 40 
per cent in others. 

With these assumptions we may take w;=n,; (binomial weighting) in 
method (i) and w;=1 (equal weighting) in method (ii). For each of the 
selected sets of n;, the efficiencies of the two weightings were calculated 
by formula (4). These efficiencies naturally depend on the ratio of the 
binomial to the total variation. Since the average binomial variance is 











290 AMERICAN STATISTICAL ASSOCIATION: 


pq/%i,, where %, is the harmonic mean of the nj, the ratio of the bi- 
nomial to the total variance was taken as 


u= 3 (2 + o). (5) 
n>, np, 

Each efficiency was computed for a series of values of u, ranging from 

0 tol. 

As would be expected, the efficiencies of both methods were always 
close to unity when the n; varied little. In this case either weighting is 
satisfactory whatever the relative amount of extraneous variation, 
equal weighting being preferable on account of its simplicity. For a 
case of extreme variation in the n;, the efficiencies are shown graphically 
in the Chart. The third curve (partial weighting) will be explained later. 
The values of n; were 1, 2, 6, 8, 10, 20, 30, 40, 50 and 80, exhibiting a 
range which is seldom equalled or exceeded in experimental data. 

In the Chart binomial weighting is seen to be more accurate when the 
binomial variation in the data exceeds about 55 per cent and is less 
accurate otherwise. Binomial weighting at its worst (u=0) is superior 
to equal weighting at its worst (u=1). Since this result is always true, 
binomial weighting would appear to be a better method for routine use, 
there being less chance of getting a very inefficient mean than with 
equal weights. Neither method is however satisfactory in the mid-range 
when the binomial variation lies between 30 and 80 per cent of the total 
variation. 

It should be remembered that the example represents a very extreme 
case. For examples which are perhaps more typical, in which n; varied 
over a much less extreme range than used for the illustration shown in 
the Chart, the efficiency of binomial weighting appears to exceed 80 per 
cent throughout the range, while that of equal weighting may fall to 
about 60 per cent. Nevertheless there is need for a better method when 
about half the variation is binomial and the n; are far from constant. 


PARTIAL WEIGHTING 


In the numerical examples which were worked, examinations of the 
correct weights as given by formula (2) revealed that when about half 
the variance is extraneous, the weights tend to change slowly for the 
higher values of n;. This feature is illustrated in Table I, where the 
correct weights are compared with the values of n; in the example of 
the Chart in the case in which half the variation is extraneous. 

All W; were multiplied by a common factor to make the true and 
binomial weight the same (unity) for the first observation. It will be 

















| 
f 
| 





- VARIANCE FOR PERCENTAGES BAasED ON UNEQUAL NUMBERS 


TABLE I 
BINOMIAL WEIGHTS (nj) AND CORRECT WEIGHTS WHEN u=0.5 








| 1 | 2 | 6 | 8 | 1 | 2 | 9 | « | 5 | o 











wei 1 | 1.71 | 3.26 | 3.67 | 3.98 | 4.76 | 5.10 | 5.29 | 5.41 | 5.59 





noted that the increase in W, for a fixed increase in n; becomes steadily 
and rapidly smaller as n; increases. Thus when n; increases from 1 to 2, 
W; increases from 1 to 1.71; yet when n; increases from 50 to 80, W; 
increases only from 5.41 to 5.59. In fact, the extraneous variation im- 
poses a ceiling on the weight assigned to any observation, no matter how 
high n, is. 

In these circumstances a simple method of approximating the correct 
weights is to assign a fixed upper limit to the weights. Below that limit 
the n; are chosen as weights. This device, which may be called partial 
weighting, was previously suggested [Cochran (1937), Yates and 
Cochran (1938)] for an analogous problem. At first a weighting was 
tried in which the upper limit was chosen as the average of the n; (24.7). 
With this rule the values of w; are 1, 2, 6, 8, 10, 20, 25, 25, 25, 25 re- 
spectively. The examples indicated, however, that this weighting 
reaches its highest efficiency when there is between 80 and 90 per cent 
binomial variation. The best simple convention for u between 30 and 
80 per cent is to assign a constant weight to the upper two-thirds of the 
n;. For the example cited these weights would be 1, 2, 6, 8, 8, 8, 8, 8, 8, 
8. In computation this procedure has the additional advantage of 
giving fewer unequal weights to deal with. 

The efficiency of this type of partial weighting is shown by the 
dotted curve in the Chart. When u lies between 20 and 80 per cent, 
partial weighting is superior to both binomial and equal weighting. Its 
efficiency is above 90 per cent for any value of u between 10 and 70 per 
cent. 

Partial weighting has an element of arbitrariness in the choice of the 
upper limit. The choice appears to make relatively little difference if 
somewhere between one-half and two-thirds of the observations are 
given the same weight. The latter rule is preferable for general use. 


PRELIMINARY INVESTIGATION OF DATA TO BE ANALYZED 


We now discuss certain preliminary calculations which assist in de- 
ciding how a batch of data is to be analyzed. The first question to be 
considered is: are the variations in the totals n; sufficiently small so that 
they can be ignored? Sometimes the answer, one way or the other, is 
obvious on inspection; but frequently with moderate variation in the 














292 AMERICAN STATISTICAL ASSOCIATION - 


totals it is doubtful whether the extra work involved in a weighted 
analysis is justified. 

If variations in the n; are ignored when they should be taken into 
account as in formula (2), some loss of efficiency results (and in addition 
some disturbance of the significance levels for z, F, or t-tests). From the 
preceding discussion it is clear that this loss cannot properly be esti- 
mated without knowledge of the relative amounts of binomial and 
extraneous variation in the data. 

An upper limit to the loss can however be assigned from the values of 
the n;. For it may be shown from formula (2) that the loss is greatest 
when all the variation is binomial. (The Chart provides an illustration 
of this result.) Consequently an upper limit to the loss is obtained by 
calculating the efficiency obtained from equal weights on the assump- 
tion that all the variation is binomial. 

The purpose of the analysis of variance is presumed to be to estimate 
and test certain group or treatment averages. As might be expected, 
the potential loss of efficiency from equal weights depends on the type 
of classification present in the data (e.g., single grouping, two-fold 
classification or multiple classification.) Only the simplest and most 
common types—the single grouping and the two-fold classification— 
will be considered here. 

For the same data the loss differs also from one type of treatment 
comparison to another. An average figure may be obtained by supposing 
that the true treatment means are distributed about the general mean 
with variance o,? and that this variance is the quantity to be estimated. 


Single grouping. With r treatments and s observed fractions for each 
treatment, the totals n;; on which the fractions are based may be set 
out as in Table II. 














TABLE II 
Treatments 
1 2 r 
nu na Nr, 
nis N32 Nes 
Nis Nie Tre 
Totals Ni. Ns+*** Np, N 





When reduced to a comparable basis, the expected values of the 
treatments mean square for the two types of analysis work as follows: 


1 [ly _ Mt+-++ + = 
Y= %) 


a. (6) 





Binomial weights: pq + N 














— to we @ 


=~ ouE=-= er = 








~~ 








- VARIANCE FOR PERCENTAGES BASED ON UNEQUAL NUMBERS 293 


The coefficient of o/7 is always slightly less than the mean of the treat- 
ment totals N;. (unless these totals are identical). If the number of 
treatments exceeds 10, or if the N;. vary little, the mean of the N;. may 
be used in (6) as a sufficiently good approximation. 


Equal weights: Pd + sino?” (7) 
where %, is the harmonic mean of the n;;, given by the equation 


1 a f-3 1 
a ewes is 


Nip, Ts N11 Mrs 





For the estimation of ¢,? the lower limit to the efficiency of equal weight- 
ing is therefore 


ar — 1)m / [wv y (9) 





Ni2+-+- + a 
7 


If this ratio exceeds 0.9 it may be concluded that variations in the nj; 
can be neglected unless the data require the best possible analysis. 


Equal weights within treatments. It sometimes happens that the totals 
nij are substantially constant within each treatment, but change con- 
siderably from one treatment to another. In this case, although a com- 
pletely equal weighting is possibly unsatisfactory, an analysis in which 
each observation is weighted by the average n;; for that treatment may 
be highly efficient. The lower limit to the efficiency of this type of 
weighting relative to binomial weighting may be shown to be 


(r — 1)s?/A (10) 


where ) is the rather complex expression 


s=[(m. -=S) (H+. +) 4 
N ny Nis 
_ Ney 1 1 
(me -SF) Gott s)p an 


A close approximation to (10) which is more quickly calculated is 
rs*/ (12) 








where 








294 AMERICAN STATISTICAL ASSOCIATION: 


TABLE III 


NUMBER OF SERVICES (n), NUMBER OF CONCEPTIONS (a2) AND PERCENTAGE 
OF CONCEPTIONS TO SERVICES IN ARTIFICIAL INSEMINATION TESTS 






































































































































Bull 
Sample 1 2 3 4 5 6 

n ai%in ai%in ali%in ali%in a %\in al% 
1 |13] 6| 46/10] 7| 70] 23| 12/52] 3| 1/33/12] 5] 42/17] 6| 35 
2 113} 4/31| 8] 3/38/16] 7|44/15| 7| 47/411] 7 | 64] 22| 15] 68 
3 |19| 7137] 9] 4/44/14] 8]57] 9| 3] 33]16] 8] 50| 39| 23] 59 
4 | 6! 3/]50| 8| 5|/62/15| 6/40/14] 3| 21| 16] 11| 69] 16| 6| 38 
5 |13| 8|62| 1] 1\100| 6| 5|83|10| 7| 70} 13| 10] 77| 21 | 12] 57 
6 | 9| 3/33] 2] of] 0] 15/10/67] 5| 4] 80/16| 13/81] 7| 6] 86 
7 | 2/ 0] of 5] 1/20] 3] 1/33] 4! 3| 75/15] 13 | 87] 17] 13] 76 
s | 9| 5156| 1] o| 0] 6] 3\50| 8| 4] 50] 8| 5|62/14| 8] 87 
9 |10| 3/30] 8] 5] 62/14] 9] 64| 24/11] 46] 8| 6| 75] 34] 10] 29 
10 | 7| 3] 43/17/ 10/59] 23/16] 70| 21} 3|14| 2] 2/|100| 30] 18| 60 
\_—_|—— —_—_———— | =e Gee Gee Eee Ee ees ee —_—_—_—|-_—_—— 
Total j101 42 | 42 | 69 | 36 | 52 |135 | 77 | 57 113 | 46 | 41 [117 | 80 | 68 [217 [117 | 54 

Z(ree.) 1.4151 3.3449 1.0924 1.3468 1.2553 0.5758 





Grand totals: (n), 752; (a), 398; (rec.), 9.0303 


NUMERICAL EXAMPLE 


The practical use of these formulae may be illustrated by a numerical 
example.! In this investigation a number of semen samples were taken 
from each of a number of bulls. With each sample one or more cows 
were inseminated artificially. The data in Table III show the number 
of inseminations (n,;;) and the number of conceptions (a;;) for every 
sample. The fraction a;;/n;; measures the success of the sample in 
artificial insemination and is the variable to be studied. As a prelimi- 
nary step in the investigation it is desired to test the differences among 
bulls. The original data contain the results for 21 samples for each of 17 
bulls. For illustration 6 bulls were chosen, and for each bull 10 of the 
21 samples were selected at random. 

The numbers of services fluctuate widely both within and among 
bulls. If nearly all the variation is binomial, neither equal weighting nor 
equal weighting within treatments appears satisfactory at first sight, 
though the latter would be expected to be somewhat more efficient 
than the former. 

The first step in applying the preliminary tests is to record the sum 
of the reciprocals of the n,;; for each treatment (i.e. bull), as shown 
below the treatment totals (1.4151 etc.). With the aid of a table of 


1 I am indebted to Dr. G. W. Salisbury, Department of Animal Husbandry, Cornell University, for 
permission to use these data. 











~— 


2 ow ah ee 


ee hee a 














ee ltl a IN, a a —_- 
ay 7 








- VARIANCE FOR PERCENTAGES BASED ON UNEQUAL NUMBERS 295 


reciprocals these figures can be summed directly on the calculating 
machine. Their total over all treatments is 9.0303. 
Three auxiliary quantities are now calculated as follows: 


; N — (Ni.2 +--+ + Ny.2)/N 

@) = 752 — [(101)? + - - - + (217)?]/752 = 610.07 
(ii) tix = 60/(9.0303) = 6.644 
where 60 = rs is the total number of samples tested. 

(iii) »=101X1.4151+69X3.3449+4+ --- +217K0.5758=945.2. 


The last quantity is required for the approximate test in formula (12) 
of equal weights within groups and need not be calculated if the treat- 
ment totals N;. are all approximately the same. 

From formula (9), the efficiency for equal weights is 


10 X 5 X 6.644/610.07 = .544. 


From formula (12), the efficiency for equal weights within groups is 
approximately 


6 X 100/945.2 = .635. 


The exact value, by the more laborious formula (10) is found to be 
0.628. 

Thus if the variation within bulls is entirely binomial, the analyses 
by equal weights and by equal weights within treatments are equiva- 
lent to discarding nearly 50 and 40 per cent of the information respec- 
tively, a procedure which is unjustified for anything but a crude exami- 
nation. This result does not however dispose of the question of weight- 
ing. Since the samples from the same bull were tested on different 
cows at different times and in different places, there is no a priori 
reason to suppose that the probability of securing a conception is con- 
stant for all samples from a single bull. The estimation of the percentage 
of extraneous variation, which requires an analysis of variance of the 
actual percentages of conceptions, constitutes the next step. 


ESTIMATION OF THE RELATIVE AMOUNTS OF BINOMIAL AND EXTRANEOUS 
VARIATION 


Either equal weights or binomial weights may be used. If on inspec- 
tion most of the variation appears to be extraneous the former are ad- 
visable, while if most of the variation appears to be binomial the latter 
are preferable unless the weighted analysis is considered too compli- 
cated. The reasons for these choices are: (i) theoretically we expect a 
better estimate when the weighting used is closer to the correct weight- 











296 AMERICAN STATISTICAL ASSOCIATION: 


ing and (ii) if the preliminary guess turns out to be approximately cor- 
rect, the preliminary analysis may be satisfactory for tests of signifi- 
cance. In this case no further analysis will be needed. 

When extraneous variation is present, the error mean square is com- 
posed of binomial variation, extraneous variation and random fluctua- 
tions. For a simple group comparison, the expected value of the error 
mean square of the unweighted fractions may be taken as 


Be = = +03 (14) 
Nh 


where p is the average of the true fractions and a,? is the average ex- 
traneous variance. Actually this result is not quite correct, since the 
expectation depends on the unknown values of the true fractions p;; for 
each observation. In equation (14) the average fraction p has been 
substituted for these unknowns. The result should be sufficiently ac- 


curate for its purpose. 
Thus if s* is the error mean square, the ratio of the binomial to the 


total variance is estimated as 
fl —f)/s?in, (15) 


where f is the mean observed fraction. 
If a weighted analysis is used, with weights n,;, the error mean square 
S~* is an estimate of 
oe” : (mu? + +++ + my?) 
N - 


r(s —1) Ni. 





pq + 





(nn? + ++ + ny?) 

—_———_—— | (16) 
N-. 

The coefficient of ¢,’ is slightly smaller than the arithmetic mean # of 


the n;;. Unless the number of observations or replications per treatment 
is less than 4, it is usually sufficiently accurate to use # in (16), writing 


E(sw?) = pq + tio,?. (17) 


Since the average binomial variance has been defined as pq/m,, the 
estimates of the two components of variance are: 


(1 —f an? = fj ae 
ja-f. ee. et 








- Extraneous: 
Nh n 


Binomial: 


As a rough working rule it is suggested that if less than 30 per cent 








- VARIANCE FOR PERCENTAGES BASED ON UNEQUAL NUMBERS 297 
of the variation is binomial, equal weights may be employed for the 
analysis on which tests of significance are based. Binomial weighting is 
satisfactory if more than 80 per cent of the variation is binomial. In the 
intervening range partial weights may be used; or if the data are suf- 
ficiently extensive and important to warrant the additional work, the 
actual weights W; in formula (2) may be estimated from the computa- 
tions which have been carried out. 

It should however be remembered that unless the number of degrees 
of freedom for error is large, the estimate of the percentage of binomial 
to total variation is itself subject to a large sampling fluctuation. Even 
if p were known exactly, the estimated percentage of binomial variation 
would at best be distributed as eB/x?, where B is the true percentage 
of binomial variation, e is the number of error degrees of freedom, and 
x? has e degrees of freedom. For example, if a batch of data shows 50 per 
cent of binomial variation and the error has 20 degrees of freedom, the 
95 per cent lower and upper limits to the true percentage are at least 
as far apart as 27 per cent and 78 per cent. With 30 degrees of freedom 
for error, this range narrows to 30-73 per cent. It may be concluded 
that with the moderate numbers of degrees of freedom which are com- 
mon in experimental work, attempts to estimate the true weights W; 
are scarcely justified, though the technique probably does permit the 
discrimination suggested between equal weights, binomial weights and 
partial weights. 


APPLICATION TO THE NUMERICAL EXAMPLE 


In order to form an initial judgment about the extent of extraneous 
variation, the percentages of conceptions in Table III were inspected. 
Although the percentages vary widely for the same bull when the num- 
ber of inseminations is small, those percentages which are based on 
larger numbers of inseminations seem fairly stable. It was concluded as 
a first guess that the extraneous variation is small relative to the bi- 
nomial variation. A weighted analysis was therefore made. 

This analysis can be carried out on the fractions by the familiar 
“(total)?/(number)” rule. For the sums of squares, we have 








ee ¢ 18? 398? 

Total: —+—+---4+—,- = 24.058 
13 13 30 752 
42° 117? 398? 

Among bulls: —— $---- 4+ —— — — = 6.033. 
101 217 752 


The error (i.e., within bulls) mean square is 0.334 with 54 degrees of 
freedom. 








298 AMERICAN STATISTICAL ASSOCIATION: 


The additional figures required are: 
a = 752/60 = 12.53:%, = 6.644:f=398/752 = .5293 
Binomial variance = f(1 — f)/i, = .0375 
Extraneous variance = [8,2 — f(1 — f)]/a = .0068. 


The ratio of binomial to total variance is estimated as 375/443, or 
about 85 per cent. From this result binomial weighting appears ade- 
quate. 

To illustrate the alternative approach, the following is the unweighted 
analysis of variance of the percentages. 


df. Sums of squares Mean squares 
Among bulls 5 6,274 1255 
Within bulls (error) 54 24,144 447 


The unweighted mean of the percentages is 52.42. From formula (15): 
this gives (52.42) (47.58) /(447) (6.644), or 84 per cent of binomial varia- 
tion. The agreement between the two estimates is closer than will nor- 
mally be found. 

Further evidence on the superiority of binomial over equal weighting 
is provided by the F-ratios, which are 3.61 for the former and 2.80 for 


the latter. 
THE ANGULAR TRANSFORMATION 


Thus far nothing has been said about the use of the angular trans- 
formation, in which the analysis of variance is performed on the trans- 
formed variate y=sin-1(\/f). For it seems best to settle the question of 
weighting before considering the advisability of a transformation. 

The angular transformation was devised for the case in which all the 
variation is binomial with constant n. The effect of the transformation 
is to change the weights n/(p.qi), which apply to f, into a set of approxi- 
mately constant weights n/821 applicable to the angles y (expressed in 
degrees). 

When the n; vary and extraneous variation is present, the effect of 
the transformation is more complicated. Suppose however that the 
extraneous variance o;? is of the form (Ap,q;). The correct weights for f 
and for y are then: 


was = (——) (=) = (A) (ai) 
fi) = ithe aa CON the ( 


These results illustrate the fact that the transformation equalizes 
approximately any system of variances which are proportional to 


IN* 


de- 


ted 


5) 
or- 


ng 





oO re nee OE 





- VARIANCE FOR PERCENTAGES BASED ON UNEQUAL NUMBERS 299 


(p.:9:), but does not alter the extent to which the weights vary with the 
totals n;. It is for this reason that we decide first how the weights shall 
vary with n,. If an angular transformation is subsequently made, the 
weighting adopted should be used with the angles. 

The relation o,7= p,q; imposes a restrictive assumption on the na- 
ture of the extraneous variation. The assumption is perhaps not un- 
reasonable for data where the variance is greatest at 50 per cent and 
smaller towards both ends of the range. 

It seems therefore that the usual rules for deciding whether to use 
angles may be applied after the weights (equal, binomial or partial) 
have been chosen. If the percentages vary widely, especially as between 
different treatments, and if the error variance appears to be greatest 
around 50 per cent, the transformation may provide a more homogene- 
ous estimate of error. In the numerical example a transformation was 
not considered necessary in view of the fact that the average percentage 
of conceptions for the different bulls ranged only from 41 per cent to 
68 per cent. 


TWO-WAY CLASSIFICATIONS 


The principles outlined above apply also to two-way classifications 
(as in a randomized block experiment), though the detailed application 
is more complicated. The chief practical change is an increase in the at- 
tractiveness of equal weighting. With a given amount of variation in 
the n;;, the efficiency of equal weighting is usually somewhat greater 
than in the case of a simple grouping. In addition, it has been shown 
(Yates, 1934) that a weighted analysis cannot be carried out by the 
elementary methods of the analysis of variance, because the sums of 
squares for the rows and columns of the classification are in general 
non-orthogonal. In order to test either rows or columns against their 
interaction, a set of multiple regression equations must be solved. For 
this calculation the procedure described for a continuous variate by 
Yates (1934) may be followed, provided that the weights w;; are sub- 
stituted for Yates’s totals n,; Except in special cases, there will be no 
“within-classes” sum of squares. 

Some modifications are required in the formulae used in the prelimi- 
nary tests. According to the nature of the data, we may wish to con- 
sider one of three simple methods of weighting, each of which avoids 
the labor of a least squares solution: (1) equal weights; (2) equal weights 
within treatments; (3) equal weights within rows or replications. On the 
assumption that all the variation is binomial, the minimum values for 
the efficiencies of these methods are as follows: 








300 AMERICAN STATISTICAL ASSOCIATION: 








Equal weights: s(r — 1)%,/v (20) 
Equal weights within treatments: (21) 
rs?[N — (Ni.2 +--+ + N,.2)/N]/uv 
Equal weights within replicates: r(r — 1)N?/6v (22) 
where 
mi? ++ + n° (m.? +--+ + ,,’) 
ee. 1 (23) 
N -l N., 
: 1 1 
oan. ae 
ni Mri (24) 
_fil 1 
+W.t(—+---+—), 
Nis Nrs 


The quantities v and @ involve respectively sums of the squares of nj; 
and of the reciprocals of n;; taken along the rows. Formula (21) is an 
approximation, » having the value defined in formula (13). 

The use of any of these formulae requires a certain amount of pre- 
liminary calculation. Where data of this type are frequently encoun- 
tered, the computer soon becomes experienced in estimating the efficien- 
cies simply by inspection of the values of the n;;, so that in course of 
time the calculations need seldom be made. 

Should none of these methods be considered satisfactory, the rela- 
tive amounts of binomial and extraneous variation must be estimated. 
Unless it appears fairly certain that the greater part of the variation is 
binomial, estimation from an equally weighted analysis of variance 
is preferable on account of the saving in time. With equal weights, for- 
mula (14) remains valid for the expected value of the error mean square 
(in this case the treatments times replicates interaction). When bino- 
mial weights are used and the least squares analysis is carried out, the 
expected value of the error mean square is cumbersome. The approxi- 
mate formula (17) appears to be sufficiently accurate for practical use; 
it underestimates slightly the relative amount of binomial variation. 


SUMMARY OF SUGGESTED PROCEDURE 


1. Consider whether one of the simplest methods of analysis (equal 
weights, equal weights within treatments, equal weights within repli- 
cates) can be used without further investigation. If in doubt apply the 
appropriate test. 

2. If none of the methods in (1) appears satisfactory, estimate the 
relative amounts of binomial and extraneous variation from an analysis 
of variance of the fractions or percentages. 


0) 
1) 


2) 


3) 


4) 








- VARIANCE FOR PERCENTAGES BASED ON UNEQUAL NUMBERS 301 


3. As a rough discriminatory rule, adopt binomial weights for the 
subsequent analysis if more than 80 per cent of the variation is bi- 
nomial, and equal weights if less than 30 per cent of the variation is bi- 
nomial. In the intervening range use partial weighting unless the data 
are extensive and important enough to justify the estimation of the ex- 
act weights for every observation. 

With a two-way classification it may be advisable under pressure of 
time to weight equally when as much as 50 per cent of the variation is 
binomial. In the examples which I have investigated, the loss of effi- 
ciency in this case was less than half the loss when all the variation is 
binomial. 

4. When a method of weighting has been adopted, consider whether 
an angular transformation is advisable. 

This procedure can be compressed as experience accumulates. 

A discussion of the effects of erroneous weights on the validity of 
z, F or t-tests is beyond the scope of this paper. It appears that the more 
closely the weights used approach the true weights, the smaller is the 
disturbance in levels of significance. Consequently a procedure designed 
to choose weights that are approximately correct should also lead to an 
analysis of variance for which the tabulated significance levels are ap- 
proximately valid. 

REFERENCES 
Bartlett (1936). Journal of the Royal Statistical Society, Supplement, 3, 68. 
Bartlett (1937). Journal of the Royal Statistical Society, Supplement, 4, 168. 
Bliss (1938). Ohio Journal of Science, 38, 9. 
Clark and Leonard (1939). Journal of the American Society of Agronomy, 31, 55. 
Cochran (1937). Journal of the Royal Statistical Society, Supplement, 4, 102. 
Cochran (1938). Empire Journal of Experimental Agriculture, 6, 157. 
Cochran (1940). Annals of Mathematical Statistics, 11, 335. 


Yates (1934). This JourNna., 29, 60. 
Yates and Cochran (1938). Journal of Agricultural Science, 28, 556. 








“RETRO” CHARTS 


By Karu KarstTEN AND EpitH Brooks 


T IS THE PURPOSE of this paper to introduce a family of useful chart- 

forms, discussing the necessary and sufficient conditions for their 
use, the main features of these charts, the different variations which 
are possible, and the use of these charts in collections where standard- 
ization is desirable." 

A census? of chart-literature indicates that the majority of widely- 
circulated charts show curves of time-series, and that in the greater 
part of these the reader is expected to be more interested in recent events 
than in the remote past. In this large group of charts the chart-maker 
can be said to play the role of historian. 

And here the folkways of chart-making sometimes confine us in 
methodological rigidities which are absurd. Wouldn’t it be silly if the 
historian were forced to cover a uniform number of years per page 
throughout his book? And if we are making a historical chart, may 
there not be occasions when it is just as silly for us to use equal dis- 
tances for equal units of time? When such rigidity causes an error in 
the placing of emphasis, it certainly makes a poor chart. If the error 
causes deception, it makes a misleading chart. Both results are explain- 
able, but not necessarily excusable, on grounds of convenience or of 
custom. 

Custom has not so shackled the historian. For him it is accepted 
practice to devote much space to important periods of time and cover 
briefly what is unimportant. This treatment is realistic. It omits unde- 
sired details. It meets the reader’s needs and interests. 

The statistician often seeks escape from these rigidities which the 
historian has never known. He may try the device of breaking his time- 
series chart into two parts, one of which shows yearly data, for exam- 
ple, over a long period on a condensed scale, while the other part shows 
monthly figures, let us say, for only a few years on an enlarged scale. 
He gives both parts the same vertical scale, he joins them together at 
a common meeting-point of time without overlapping, and lo and be- 
hold, he has achieved great detail for the recent past and great coverage 
for the remote past. 


1 Previous contributions on this subject have been incomplete. See Karl Karsten, Charts and Graphs 
(Prentice-Hall, Inc., New York, 1925), pp. 474-5. The present writers are indebted to Mr. Thomas 
Blaisdell and Mr. Spurgeon Bell of the National Resources Planning Board for the opportunity to de- 
velop and explore the use of “retro” charts for collections of standardized charts. We are indebted to 
Col. Lewis Sanders for his insistence that we try out the simple perspective type of “retro” chart. 

2 One sampling of 274 charts from 26 publications (of trade conditions) showed 97 per cent were 
charts of historical time-series and only 3 per cent belonged to other types. Of the time-series charts, 
63 per cent covered between 5 and 50 years and would be appropriate subjects for the present discussion; 


302 


ir 


ch 


y- 
er 


er 





Se re 


<= ew 


-“Retro” CHARTS 303 


The existence of such segmented double-gaited charts is eloquent 
evidence of the conflict between rigidities and requirements. But it is 
not a satisfactory solution of the conflict. The chart requires difficult 
mental acrobatics when reading the curve or curves from one part to 
the other.’ It has other disadvantages. In view of the conflict, however, 
it is an ingenious compromise. 


MAIN FEATURES 


In many respects a better compromise is afforded by the family of 
“retro” or retrospective charts here proposed. These charts avoid the 
dreary expanse of the orthodox grid and the unbridged saltation of the 
segmented device. Like the latter, they differentiate between the recent 
and the remote past in detail and emphasis. Like the former, they pre- 
serve unbroken continuity between all adjacent portions of the chart. 
In short, they have the advantages of both and the disadvantages of 
neither. 

CHART I 


SEVEN “RETRO” CHARTS—PERSPECTIVE TYPE 
CONVERTED INTO IMAGES OF SOLIDS BY USE OF A THIRD DIMENSION 
(with Arithmetical Vertical Scale) 
In this chart the indexes of the value of construction in the United States are shown for the past 
three years by seven curves each of which forms a separate retrospective chart. 


+ 
ae: 
















































+-—+—{- 
100%--- - . + --100% 
Sent ex 
ro 
=e 
—_ STOTR 1943 J 
PUBLIC WAR 6@ NAVY PUBLIC PUBLIC PUBLIC PRIVATE NON= Vv PUBLIC 
INDUSTRIAL FACILITIES NON- OTHER RESIDENTIAL RESIDENTIAL RESIDENTIAL 
BUILOING® BUILDING 


Moreover, these charts introduce a graduated and continuous change 
in degree of emphasis and detail. Reading from right to left, we find 
the units of time continuously dwindling in size just as the memory of 
details fades, in life, with the passage of time. Strong argument can 
be offered that in this feature the “retro” chart provides a clearer and 
more truthful picture of past events seen in retrospect as a connected 
entirety, than is possible with the orthodox chart. It is not possible to 
design the “retro” chart without the implicit addition of a third dimen- 
sion, which makes the past become dynamic, realistic, alive. (See Chart 
I.) 


* It reminds one of the story by Jack London about a thief who evaded detection by shifting sud- 
denly from the leisurely tread of a club member to the hurried pace of a waiter. 








304 AMERICAN STATISTICAL ASSOCIATION: 


TRUE PERSPECTIVE 


Conceptually the simplest “retro” chart is that which adheres faith- 
fully to the rules of linear perspective and presents a perspective ren- 
dering in three dimensions. Imagine that the curve of the past has been 
drawn in the orthodox way on a very large scale, if you please, along the 
face of a high fence. Let us stand off from one end of this fence and view 


CHART II 
“RETRO” CHART—PERSPECTIVE TYPE 
(with Logarithmic Vertical Scale) 


In this chart, showing average monthly Federal Government Receipts over the last 50 years, it is 
interesting to observe that without difficulty transition has been made from annual averages to quarterly 
moving averages in 1930, and from these to monthly moving averages in 1939. 


80,000.00 ~ 





A 
Ail rT] — 


il ' 25,000,000 — 
4 





































































































+940 seat ‘eee 


§ 
g} 


0 1936 


it from a position in front of the latter portion, just as Tom Sawyer 
must have seen the whitewashed fence as he stood off and admired it. 
At that moment the “retro” chart emerges, complete and inevitable. 
In it we view the past from one end only, as in life itself. (See Chart II.) 

Among the disadvantages of this chart is the fact that its grid is la- 
borious to construct.‘ Moreover its vertical scale changes as one moves 


‘ To construct the perspective-“retro” scale, the following procedure may be used: 
(1) Decide upon the total number, N, of time units to be shown, also the total space, S, in inches 
























th- 
on- 
en 


ew 


it is 
erly 








we ere. 


me 


-“Retro” CHARTS 305 


across the chart,* so that magnitudes must be plotted by inspection, or 
measured precisely along some abscissa where a ruler conveniently 
fits, and transferred to the proper abscissa by triangulation through the 
vanishing point.® 

As to the slopes of the curves, it is noteworthy that straight lines on 
orthodox charts are straight lines on these charts.’ But the angle of 
slope is not the same for different straight lines which would be parallel 
on orthodox charts. The reader must obviously compare the slope of 
each rise or decline with its background grid (the slope of adjacent 
ordinate rulings) before reading significance into the slope of the curve. 
Measured from that background grid, the angle of slope has the same 
significance as in orthodox charts. (See Chart III.) 


OTHER TRIANGULAR FORMS 


Many variations of the triangular “retro” chart can be made, either 
on logical grounds or on grounds of convenience in drafting. These 





to be covered by these time units on the “retro” scale, and the ratio, R, such that 1/N<R<1.0, 
between the space given to the most recent time unit and the total space, S. (This ratio, of course, 
determines the extent of variation of the “retro” scale from the orthodox scale.) 


N-—RN 1 
(2) Compute the constant; Q =——————- and the values of the following series; a =0; 6 =———— ; 
RN-1.0 Q+1 


2 3 
c=m— 3 d = ——_ +> - + 2 oe ——— 
Q+2 Q+3 Q+N 
eye to the picture in the perspective rendering; it serves to obtain the desired ratio, R, of variation 
from the orthodox scale.) 


+ (The constant, Q, corresponds to the usual distance from the 


S 
(3) Compute the vanishing point; Va +N), the distance in inches along the “retro” scale 


from the most recent date to the vanishing point. This point must be noted on the drawing as all the 
radiating ordinates must be extended to pass through the radiate from this point, even though the 
point lies, of course, outside the finished chart. 

(4) Measuring backward in inches along the “retro” scale from the most recent date to the vanish- 
ing point, plot the successive points of time at the values of aV, bV, cV, dV,... nV to form the de- 
sired “retro” scale. 

(5) From the vanishing point, V, project the converging lines corresponding to the horizontal 
ordinates of the orthodox chart. 

5 The vertical scale has not been discussed in the present text because the “retro” chart whether 
rectangular or triangular can obviously have the same variety of vertical scales that orthodox charts 
have. In Chart I, the arithmetical vertical scale has been used while Charts II, III, and IV have log- 
arithmic vertical scales. The regular spacing of ordinates for scale values in strict geometric progressior 
employed on these examples is believed to be helpful in explaining logarithmic charts to the layman 
and is certainly a great advantage for the triangular “retro” charts since it avoids excessive crowding 
of converging ordinate lines. Credit for this refinement belongs to Mr. Spurgeon Bell. 

6 The term vanishing point is here used to denote the point where converging ordinates meet in all 
triangular “retro” charts. This point can, of course, be placed whenever one wishes along the X-axis 
in the case of some of the pseudo-perspective triangular “retro” charts discussed in the following para- 
graph, namely, in the logarithmic and the backward square-root scales. For the forward powers projec- 
tion, the point of origin in the X-axis would serve as a logical converging point for the ordinate rulings. 

? This is true whether the vertical scale be arithmetical or logarithmic, for it is obvious that the 
“retro” chart, which is but a perspective rendering of the orthodox chart, can have any vertical scale 
and still reproduce straight lines as straight lines. 








306 AMERICAN STATISTICAL ASSOCIATION: 


form a group which might be called “pseudo-perspective retro” charts. 
A backward logarithmic scale can be constructed very easily by count- 
ing the time-units backward from some arbitrary origin in the future 
and plotting the resulting values from a slide-rule.* A somewhat similar 
scale can be constructed by plotting the square-roots of these units. 


CHART III 
ORTHODOX CHART 
(with Logarithmie Vertical Scale) 


This chart of average monthly Federal Government Receipts is comparable to the preceding “retro” 
chart (Chart II) showing precisely the same figures plotted on the orthodox time scale. It is this chart 
which we can imagine portrayed in Chart II after considerable expansion followed by rotation through 
nearly 90 degrees. 





Alternatively, the time-units can be counted forward from an assumed 
origin previous to the series, and plotted according to their squares or 
other powers, or according to their anti-logarithms. These and other 
devices have individual advantages and disadvantages, but in general 
serve to graduate the time-scale in what could be called a retrospective 


manner.?® 
RECTANGULAR FORMS 


Between the foregoing triangular “retro” charts and the orthodox 
grid, there can be found a large group of intermediate compromises of 
rectangular shape. In these the feature of converging ordinates meeting 
at a vanishing point is abandoned and a return is made to the parallel 
horizontal ordinates of the orthodox grid. This is analogous to the iso- 


8 If it be true that among the factors which control inherited characteristics, a parent is twice as 
important as a grandparent, four times as important as a great-grandparent, eight times as important 
as a great-great-grandparent, then a logical analogy is established for the use of the logarithmic scale, 
in which the years march backward at such diminishing pace that the space given to the two years 
before the last is the same as the next previous four years, and also to the next previous eight years, and 
back of that to the next previous sixteen years, and so on in a geometric progression. 

* In most “retro” charts a certain point of time is treated as a point of origin or reference, and the 
time scale cannot be extended or extrapolated past this point any more than a curve on logarithmic 
rulings can pass from positive to negative values. In the perspective projection, and in the backward 
square-root and logarithmic projections, the chart can be extrapolated backward indefinitely, but not 
forward. In the forward antilog and powers projections the reverse limitation obtains, and the time 
scale can be extended or extrapolated forward to any extent but not backward. These differences will 
sometimes be important in determining the choice of type of “retro” chart to employ. It is also impor- 
tant to note that in the pseudo-perspective triangular “retro” charts, the power to reproduce as & 
straight line every straight line on orthodox charts has been lost. 


art 
gh 





Yr 
r 


il 


~~ OD 


lhl Oo Ff he OO DO 





ae 


a 





-“Retro” CHARTS 307 


metric drawing prepared by architects and engineers whereby uniform 
scales are employed in flat pictures showing various surfaces of a three- 
dimensional solid. There results a sort of “mercator’s projection” of the 
triangular “retro” chart. (See Chart IV.) 

This group of hybrids is highly conventionalized, and is of value only 
when its conventional nature is fully and correctly understood. It at- 
tains uniformity of vertical scale across the chart, which is a great con- 


CHART IV 


“RETRO” CHART—ISOMETRIC TYPE 
(with Logarithmic Vertical Scale) 
Here is a third chart of the average Federal Government Receipts shown in Charts II, and III, but 
the horizontal rulings are parallel as in an orthodox chart. The retrospective scale used in the chart is a 
backward logarithmic projection such as can be easily plotted from a slide rule. 


$800,000. 000-4 





400,000,000 4 


\ {) —) 


200,000,000 ~ 





100,000,000 + 


a $0,000,000 + 


| 


tl 


WII 


4 


| 
HL 
a a 






















































































$25,000,000 
5 ; “36 ‘36 1940 tee: ean 

















Co 


venience in construction and in analytical reference. But it has sacri- 
ficed the quality of reproducing as a straight line any trend which would 
describe a straight line on an orthodox grid. Therefore the angles of 
slope cease to be comparable over different portions of the time-scale 
and become as misleading to the unwary as they are in the orthodox 
logarithmic chart for the arithmetically minded reader." 


10 For most persons these limitations of the “retro” chart will not be of much practical importance. 
Few readers of a chart concern themselves with the mathematical concepts of acceleration and decelera- 








308 AMERICAN STATISTICAL ASSOCIATION: 
OTHER PATTERNS OF EMPHASIS 


Lastly, there is a large group of charts analogous to “retro” charts 
which are not retrospective but achieve some other historical point of 
view. Thus charts could be designed to look into the future instead of 
into the past, and to see the distant future smaller than the near future. 
Better still, we may “panoram” from the past, through the present and 
into the future, in a single chart." The graduations of such a chart 
might taper off at each end by a reverse function of the normal proba- 
bilities projection.” Or it may be desirable to expand in great detail two 
periods and compress an intermediate period as well as the exterior 
periods of time. Such a bi-modal form, high-lighting the first and second 
world war periods, might be analytically useful for war-time planning. 
These statistical refinements will vary with different students and stud- 
ies. They will not, however, have universal historical value, nor will 
they achieve the most satisfactory simplicity, but for particular pur- 
poses they can provide a most powerful and illuminating chart-me- 
dium. 


STANDARDIZATION BENEFITS 


For use with other “retro” charts of the same type, especially when 
forming a large collection of charts with identical time-scales, to facili- 
tate comparison, the “retro” chart becomes a great practical conveni- 
ence. This is particularly true if—as is usually the case—the charts 
must extend backward to different dates because they show time-series 
which cover different lengths of time." 

Mathematically it would seem that by no other device could a wide 
variety of historical time-series be presented upon a single standard 
time-scale with satisfactory and proper individual detail for each one, 
when the series differ extremely in length.“ 





tion measured by successive differences and shown visually by the tangent of the curve. Also it should 
be remembered that the slopes of curves have only limited significance in any chart. This must be 
correctly understood by the reader or it may prove deceptive. Thus in the common arithmetically- 
scaled chart, the angle of slope is significant as to amount of change but deceptive as to rate of change, 
while on the ratio or semi-log chart the reverse is true, and the angle of slope is significant as to rate 
of change but may be most deceptive as to amount of change. 

11 This two-way historical chart or panorama can consist of two rectangular “retro” charts or of 
two triangular “retro” charts, in either case joined at the expanded portions of the scales. For very 
simple use, the double triangular form is preferable as it approximates the result obtained with a 
panoram camera. 

12 For an example of this projection, see Karsten, op. cit., p. 457. 

13 In this respect its convenience for use on the horizontal scale reminds one of the convenience with 
which the logarithmic vertical scale served to plot series fluctuating through widely different magni- 
tudes—a feature which appealed to many when log charts were first introduced. 

4 On the one hand, it will be found that a short time-series covering only the last five years, for 


rts 
_ of 
of 


nd 
art 


wo 
ior 











RTI er 


ae 





-“RetTro” CHARTS 309 


Such charts are easy to read because it is unnecessary for the reader 
to re-orient himself carefully for each new chart. Every time-period in 
the past is always located in the same position on every chart. 

Comparison of one chart with another is facilitated because any chart 
can be readily superimposed upon another. The reader can lay them 
before him with one above the other and compare them; he can hold 
them up to the light and look through the two of them; he can trace one 
curve upon the other mentally or physically. There is minimal trouble 
and no inaccuracy in these comparisons. 

A disadvantage of “retro” charts is that they are not so suitable as 
the orthodox chart for the type of time-lag analysis which consists of 
shifting one chart physically to the right or left before comparing it with 
another. With these charts it becomes necessary to keep on shifting and 
readjusting the lagged chart as one compares it at different time-peri- 
ods.'* This can be performed with these charts by progressive readjust- 
ment. 


One should not take too seriously the disadvantages and limitations 
which have been recited. They apply equally to the segmented double- 
gaited orthodox monstrosities which have been described, and which 
are not commonly quarrelled with. 

Since the “retro” chart is a popular, not a research tool, its merit lies 
in uses for which it proves to be a more truthful and effective medium of 





example, will be shown in great detail over about half of the chart page. In the examples given here 
(Charts II and IV) the last four years by reason of their wide spaces show monthly data, and before 
that eight years of quarterly data before the spacing becomes too narrow for any but annual figures. 
In the other extreme case, a time-series covering the last half century will be shown in its entirety upon 
the single chart page (by reason of the omission of detail in the picture of the earlier years) yet each of 
these two charts will be comparable with the other. It would be impossible on orthodox charts to show 
the 50-year chart on the same scale as the five-year chart unless either (1) the latter were compressed 
to a very small size and its monthly details omitted, or (2) the 50-year chart were made extremely long 
so as to extend over several] or many pages. Either of these two solutions would obviously be so unsatis- 
factory that the effort to attain standardization from chart to chart would necessarily be abandoned. 
In the end, each chart would be presented on its own appropriate scale with no intercomparability from 
chart to chart. 

145 To facilitate cross comparisons, all identifying marks along the time scale should be identical in 
style and typography. It is, of course, not necessary for the scale marks on a short series to be carried 
back beyond where the curve begins, but the abscissae should be alike whenever they occur. It may be 
useful to use extra heavy rulings for special abscissae such as the divisions between decades. It may be 
especially helpful to adopt a method of shading a portion of the chart lying under the curve between 
certain significant abscissae, thus the area representing war years or the depression years might be in- 
variably shaded in a characteristic and recognizable fashion. These landmarks would help the reader to 
orient himself on each chart. 

16 This is not difficult since one is only keeping the two backgrounds “in gear.” But if a stationary 
picture is desired, it would be necessary to replot the curve with the appropriate time-lag or -lead. This 
limitation is likely to be over-rated. Actually this type of analysis is much less frequently practiced than 
we imagine. Few time-series have rational bases for anything beyond very short time-lags or -leads (see 
Karl Karsten, “The Theory of Quadrature in Economics,” this JounNaL, March 1924, pp. 14-29). It is 
the power to perform this analysis that we have been taught to prize. 








310 AMERICAN STATISTICAL ASSOCIATION: 


information.’” This is the case when its peculiar arrangements convey 
to the reader their message with just the degree of detail desired. To a 
busy executive, in such cases, a service of these charts, besides being 
more informative, will be a time-saver. 


17 Twenty years ago the present writer prepared blank pages for history note-books in which the 
years were printed down the edge of the page in a backward logarithmic progression. These forms were 
found to be useful for historical notes on the widest variety of topics. Whether one was filling in the 
chronology of Chinese art, of American history, of European architecture, or of geological cycles, the 
retrospective scale was found to make appropriate provisions for the various periods covered. On such 
a scale it is quite possible to plot the historical events back to the approximate time of the formation of 
the earth with conveniently indicated margins of probable error of such dates. In recent years there has 
appeared on the market a brightly colored “Histogram” of evolution for schoolroom purposes tracing in 
several yards of such backward logarithmic-scale retrospective charts, the history of different kinds of 
plant and animal life from the first evidences of such life down to the present time. 


cy 
Bm: 


ng 


the 
ere 


he 
ch 


as 
in 
of 








ee ene 


>-~ 





A MECHANICAL DETERMINATION OF CORRELATION 
COEFFICIENTS AND STANDARD 
DEVIATIONS 


By Joun R. Piatt 
Physics Department, Northwestern University 
N COMPUTING a bivariate correlation coefficient by the product mo- 
ment method, it is common practice to plot a “scatter diagram” of 

the two variables on a graph as an aid in visualizing the problem. How- 
ever, it is not generally appreciated that the correlation coefficient, r, 
can be obtained dynamically from such a scatter diagram, if the points 
on the graph are given physical weight and if the graph itself is a fairly 
rigid web such as a wire mesh, by finding the physical moments of 
inertia of the system about various axes. 

This procedure is subject to experimental error, but with careful 
technique, r can be found with an accuracy of .02; and the standard 
deviations, co, and o,, of the two sets of values are obtained in the proc- 
ess with an accuracy of better than 1 per cent. After the “points” have 
been located in the “graph,” the time it takes to determine the moments 
with a simple torsion pendulum is about 8 minutes, and the time to 
compute r, oz, and oy, is less than 5 minutes. The only arithmetic neces- 
sary is an initial determination of the means of each set of values, and 
the final substitution in simple formulae. Any number of pairs of values 
can be handled from 20 to 200 or more. 

It is felt that this method can be used to advantage in statistical 
problems where punch-card and calculating machines are not available 
or where ease of computation outweighs in importance absolute ac- 
curacy of result. Probably, however, its especial value would be in 
teaching the principles of correlation to students; for in this method the 
physical interpretation of correlation—as a measure of scatter from a 
perfect prediction line—is never obscured by a maze of arithmetical 
details. 

THEORY OF OPERATION 


The standard deviation, o;, of a set of values, z;, from a mean value 
#, is given by 
yf Se 
0: = 
N 


where N is the number of values in the set. The “product moment” cor- 
relation coefficient between two matched sets of values, x;, yi, is given 


by 








311 








AMERICAN STATISTICAL ASSOCIATION: 
» (zi — (yi — 5) 
VD (a — Do (yi — 5)? 


The sums involved in these expressions are familiar in mechanics in 
the determination of the inertial ellipsoid for a set of unit mass points 
in the zy-plane connected by a rigid weightless web. If the mass points 
have the coordinates, z;, yi, then o; is just the “radius of gyration” of 


the system about the y-axis. 
/. tal (1a) 
o = a 
° NI 








and similarly 








Cy = (1b) 
NI 
and r is given by 
Pa 
= XY ° 


I is the “moment of inertia” of unit mass at unit distance from the axis 
of rotation; X and Y are the “moments of inertia” of the system about 
axes respectively parallel to the x and y axes and passing through the 
center of gravity (Z, 9); Pz, is the so-called “product of inertia.” The no- 
menclature of correlation was taken by Pearson directly from these 
dynamical terms in his first analysis of the theory of correlation;' but 
so far as the author knows, no previous use has been made of these 
dynamical analogies for actual computation of correlation coefficients. 
(Harsh and Stevens in their correlation machine utilize only the static 
analogy between the mean of a set of points and the center of gravity of 
a set of equal weights.’ 

P,, is related to actual moments in several ways, the simplest con- 
nection being deduced from the formula 


D = X cos? 6+ Y sin? @ — 2P,, sin @ cos 0 


(where D is the moment of inertia about an axis in the ry-plane making 
an angle @ with the positive z-axis), by setting @=45°. Then 


1 Karl Pearson, “Regression, Heredity, and Panmixia,” Philosophical Transactions of the Royal 
Society, A, 187, 253 (1896). 

2 C. M. Harsh and §. S. Stevens, “A Mechanical Correlator,” American Journal of Psychology, 51: 
727-30; October 1938. 


































in 


ts 
I 


-* Me 














-CORRELATION COEFFICIENTS AND STANDARD DEVIATIONS 313 


Py =3(X + Y) -—D 


giving 
3(X + Y)—D 
r= —— . (2) 
VXY 

ris thus computed from three moments of inertia of the system; o, and 
o, are each computed from one of these same moments (if J is known be- 
forehand). This formula for r can be simplified for ordinary computa- 
tion by reexpressing it in the rapidly convergent power series 


D 1 /X - Y\? 
elt erree Gace 
H(X + Y) 2\x+yY 








(2’) 
Ere] 
8\x+Y ; 
To find r from this formula, first find 
d=|X-Y|, 
a=3(X + Y), 
and d/a. 
If d/a<.4, 
r=r, = 1.00 —-— (2a) 


a 


with an error less than .02 from the approximation. 


If d/a<.8, 
=( d ) (2 ) 
r=T—-—|(— ) 
. 8 \a 


with an error less than .01 from the approximation. The condition for 
using equation 2b means that neither of the moments X and Y may be 
more than 2.2 times the other; to avoid such a case, the geometrical 
ranges of x and y on the web (which vary approximately with X and Y) 
can be adjusted beforehand to approximate equality by giving each co- 
ordinate an appropriate scale factor in plotting. Higher precision than 
equation 2b is probably not worth while, since the statistical probable 
error of r is greater than .01 unless the system has more than 500 cases 
and r is greater than .90. 













































AMERICAN STATISTICAL ASSOCIATION: 





314 


MECHANICAL ARRANGEMENT 


The physical apparatus consists of the weighted “points,” the wire 
mesh or “graph,” and a torsion pendulum for measuring the three 
moments of inertia needed in the above formulae. The “points” are 
brass escutcheon pins } inch long, which can be bought at a hardware 
store. (Lead pins would be better, if obtainable.) The “graph” is a flat 
piece of window screen (iron will do, but aluminum is better) of 12 by 12 
mesh, cut square with 51 meshes (or any other convenient odd number) 
to a side; the two center rows of meshes (x and y axes) and every tenth 
row in each direction are marked lightly with red paint to help in locat- 
ing positions on the graph. The moments of inertia of the mesh alone 
about the three axes used in the formulae, Xo, Yo, Do, are constants and 
need be measured only once; they must be subtracted from the mo- 
ments measured in a problem to give the moments due to the escutch- 
eon pins alone. If the mesh is accurately square, 

Xo = Yo = Do. 

The torsion pendulum is a 10 inch length of .009 inch steel music 
wire, hung vertically from a solder joint on a stable metal support, and 
having soldered at its lower end a small tin clip to grasp the wire mesh. 
When the clip grips the edge of the mesh, the latter hangs in a vertical 
plane; if it is then rotated 180° about the axis of the wire and released, 
it will oscillate back and forth for 20 or more cycles (depending on the 
distribution of pins and dimensions of the mesh and music wire) until 
air resistance brings it to rest. The period, 7’, of the oscillation is re- 
lated to the moment of inertia about the axis of rotation as follows: 


Ty? = kYo, etc., 


where k is a constant, Since in our formulae only ratios of moments are 
involved, we can measure the moments in arbitrary units; in fact we can 
replace each moment in our formulae by the corresponding value of ¢?, 
where ¢ is the time for the pendulum to make any convenient constant 
number of oscillations, say 25. 

The times are determined by a stopwatch reading to 0.1 second; the 
usual precautions in counting oscillations must be observed. For ease 
of counting oscillations, the length of the music wire should be set so 
that a single vibration takes between 1 and 2 seconds; for accuracy, the 
number of oscillations to be counted should be chosen to give times of 
at least 30 seconds with the pins located in the mesh for an average | 
problem. When graphing less than 35 sets of values, it may be conveni- | 
ent to double all the moments and increase the times by putting two 
pins side by side on the graph to represent each point, 











PRIN FOO 








-CORRELATION COEFFICIENTS AND STANDARD DEVIATIONS 315 


PROCEDURE 


Then to solve a problem, we first need to know the constants of the 
apparatus, Xo, Yo, Do, and I: 


X= t,? 
Y,.=%! 
Do = tay? 

th? — ty? 

~ 20,000 


(t; is the time about the y-axis of the mesh when it has 50 pins in it, 25 
on the line z= 20 and 25 on the line z= —20.) 

Next from the given matched set of zx and y-values we find by 
arithmetic the mean of each set. These means, rounded off to the near- 
est unit, are then the values of the z and y-axes on the wire mesh. (See 
the appendix for a discussion of the error introduced in rounding-off.) 
A scale is chosen so that the ranges of x and y approximately fill the 
mesh in each direction. The pins are pressed into the mesh as shown in 
the Chart; this operation goes more easily if the mesh is supported at 
the edges a little way above the work table. An aid to locating values 
on the graph quickly is a sheet of paper under the mesh marked with the 
numerical values assigned to the red-painted rows and columns. The 
time necessary for graphing is comparable to that of graphing the same 
points with a pencil and cross-section paper; our average is about 15 
minutes to insert and check 50 points. 

When the graph is completed, the mesh is clipped to the music wire 
at the positive z-axis, and 25 oscillations timed, giving ¢.; then the 
same for the positive y-axis, giving t,; then the same for the 45° angle 
(the clip attached to the upper right-hand corner of the mesh), giving 
tz. (See Chart.) 

Now we compute 


X = t,? — Xo, 
Y= 43 i Yo, 
D= tz? — Do; 


the standard deviations and correlation coefficient are obtained im- 
mediately on substituting these into equations 1 and 2. Note that ¢ 
from equation 1 is in mesh units and must be multiplied by the proper 
scale factor to put it in the numerical units of the problem. 

If the value of D is greater than a, a negative correlation is indicated; 















316 AMERICAN STATISTICAL ASSOCIATION* 




















yo 
Se 
- 4 
oe 
ae 
” Ss ~“ 


“2 


















- CORRELATION COEFFICIENTS AND STANDARD DEVIATIONS 317 


this can be determined more accurately by using in equation 2a the 
quantity D’ instead of D, and reversing the sign of r, D’ being found by 
moments about a diagonal axis through the upper left-hand corner of 
the mesh. 


ERRORS AND CORRECTIONS 


With #, and t, about 30 seconds and ¢,, and t,, about 15 seconds, an 
experimental probable error of 0.1 second leads to a probable error in 
each standard deviation of about 0.6 per cent. These same values lead 
to an experimental error of 0.8 per cent in the D/a term of equation 2a; 
this produces in r an experimental uncertainty changing linearly from 
.008 when r is zero to .000 when r is unity. If a probable error of 0.1 sec- 
ond is assigned to ¢z and to tg, in the same problem the r acquires an 
additional experimental error of .011 at zero changing to .007 at unity. 
This suggests that tg should be determined twice for increased accuracy, 
while one measurement is enough for ¢t, and ty. 

The escutcheon pins have so far been treated as mass points. This is 
a good approximation unless the highest accuracy is desired; then it is 
necessary to consider the effect of the finite length of the pins. This 
effect increases the apparent value of each of the three moments by a 
constant amount, NJ,, which can be found as shown in the appendix; 
this quantity must then be subtracted from X, Y, and D as determined 
above before using them in the formulae. In the problems which have 
been worked out so far, the effect of this correction on the computed r- 
value has been about .01. 

Several correlation coefficients have been determined on the appara- 
tus described here, and the agreement in each case with the results of 
arithmetical computation by standard methods has been satisfactory. 

The device can also be used on problems other than that of bivariate 
correlation, in particular for measurement of the standard deviation, 
a, or of the sum of the squares of the deviations, No?, of a single set of 
values, and for measurement of the product-moment, P.,, of two sets 
of values, a quantity needed for many statistical problems of all kinds. 
However, such applications will be immediately apparent from the dis- 
cussion above without further elaboration. 

We hope that the somewhat detailed treatment of design, operation, 
and sources of error given here will enable others to construct and use 
this apparatus. The dimensions and specific values mentioned are in- 
tended as guides to such construction and use and not as any limitation 
on the application of the physical principles involved. 














318 AMERICAN STATISTICAL ASSOCIATION - 


APPENDIX 


The added moment, J,, due to the finite length of each pin, can be 
found by measuring the moment J, about the y-axis with 50 pins 20 
units from that axis, and then the moment J; with 50 pins only 2 units 
from that axis. 


| a = J (ideal)50,20 aa 50], 
I, = I (ideal)s0,2 + 50J,. 


But 
I (ideal)s0,2 = .017 (ideal)s0,20 = .01(1, — 50J,). 


So 


1 1 
I, = 49.5 (I, = 017.) = 49.5 [t,? _ t,,* _ .01(t;? —_ ty?) |. 

An error is also introduced in rounding off the mean to the nearest 
unit before plotting. This error is due to the fact that the centers of 
gravity of the mesh and the points do not coincide, and the system 
therefore does not oscillate about the center of gravity of the points 
as we have assumed. However, if the points are, say, three times as 
heavy as the mesh, the center of gravity of the points must lie within 
0.2 mesh units of the axis of oscillation, and the increase in moment is 
at most 


N(.2)? + $N(.5)? = .12N 
with the pin mass and the mesh width taken as unity. In the same units, 
the correction due to finite pin length is about 
N(1.0)? 
since the radius of gyration of each pin (} inch long) is about one 


mesh unit (1/12 inch). The effect of the rounding-off is therefore con- 
siderably smaller even than the quantity J, and may be ignored. 














THE EVIDENCE FOR PERIODICITY IN 
SHORT TIME SERIES 


By Truman L. Keizer 
Harvard University 


ANY TIME SERIES are of short duration because the recording of the 
M event has, of necessity, proceeded for but a short time. Though 
it may be necessary that the student bide his time and wait for the years 
to pass in order to secure sufficient data, we should come to this con- 
clusion only after having exhausted the possibilities of small sample 
theory. A method which applies to short series and which enables a 
judgment of the hypothesis with fiducial limits is important in that, 
whatever the time span, the issue can be investigated and the hypothe- 
sis substantiated, or rejected, within the fiducial limits set by the inves- 
tigator. 

Let the time variable be X; and the function in which we postulate 
periodicity be Xo. We shall employ M for the mean, V for the variance, 
Xo to indicate an Xo as estimated from X; by means of a linear regres- 
sion line, Xo as estimated by means of a quadric regression line, Xo as 
estimated by means of a cubic regression line, etc. As is usual in multi- 
ple correlation, the dot in a subscript will indicate “independent of,” 
thus Xo. is that part of X» which is independent of Xi, Xo.1:* is that part 
of Xo which is independent of Xi, X;*, and Xo. is that part of Xo 
which is independent of X,, X;? and X;', etc. 

Before investigating periodicity we should first take out any trend 
that may exist. For the purposes of precise test, as hereinafter applied, 
it is important that the trend be represented by a line that places linear 
restrictions only upon Xp». Consider the following series of estimates 


of Xo: 


Estimate of Xp = My X;,° regression (1) 
Xo =a+bxX linear regression (2) 
z. =A-+ BX,+ CX;? quadric regression (3) 
X, = a + BX; + yX:? + 6X;* cubic regression. (4) 


When no trend is removed, equation (1) gives the estimate of Xo and 
the quantities related to X; in which we seek periodicity are (X»o— Mo) 
of the following equation, which expresses Xo as the sum of independent 
parts: 


Xo = My + (Xo — Mp). (5) 


If the number of observations in the series is N, the corresponding de- 
grees of freedom equation is 


319 

















320 AMERICAN STATISTICAL ASSOCIATION: 
(VN — 1) =0+(N - 1) (6) 
and the corresponding equation of variances is Vo = 0 + Vo. (7) 


When a linear trend is removed by equation (2), we have 


Xo = Xo + Xo. (8) 
(N — 1) = 1+ (N — 2) (9) 
Ve= V(X) + Vou. (10) 


To ascertain if the linear trend is greater than chance may be expected 
to yield, we compute the variance ratio Fi,y-2) having one degree of 
freedom in the numerator and (N —2) degrees of freedom in the denom- 
inator: 

V(X)/1_ _ V(R)(N - 2) 


= (11) 
Vou/(N — 2) Voa(1) 








Fy .-2) = 


P from this F gives the probability that a divergence from (1), or zero 
regression, as great as that observed, 6b of equation (2), would arise as 
a matter of chance if the true value of b is zero. Values of P progres- 
sively less and less than .5 provide greater and greater evidence that b 
is not a chance deviation from zero. 

Having, by this procedure, established that there is a trend, we can 
next investigate to see if it is of a quadric curvilinear sort. By the 
method of least squares we obtain the quadric regression equation (3) 
and related constants, and write: 


By a Ze + Zest (12) 
(N —1) =2+(N — 3) (13) 
Vo = V(Xo) + Vout. (14) 
Also Xo may be divided into independent parts thus: 
3,0, — 3) + Fe + Kael (15) 
(N-—1) =1+1+(N-3) (16) 
Vo = [V(X0) — V(Xo)] + V(Xo) + Vor. (17) 


The variance ratio 
[V(Xo) — V(X) (NV — 3) 
Voar 


having one degree of freedom in the numerator and (N —3) in the de- 
nominator, will yield a P that tests the significance of C of equation (3). 





(18) 


Fy ~-3) = 























are 








-THE EVIDENCE FOR PERIODICITY IN SHORT TIME SERIES 321 


If this is significant then the trend to be removed is that given by the 
quadric regression line. 

A similar procedure can test for cubic regression, 6 of equation (4), 
but should hardly be resorted to if the interest is in the existence of 
periodicity, for a cubic will itself be a close approximation to a com- 
plete cycle of a certain sine curve. The cubic curve of Chart I, which 
cuts the quadric much as would a sine curve with a period of 16 years, 
is an illustration of this. 


CHART I 
BIRTHS IN U. 8S. 1920-1940 


LEGEND 
Actue!l Number —— 
Linear Regression —-—- 


Quedric 
Cubic - ee 





ANNUAL NUMBER OF BIRTHS(in thousands) 


1920 1924 1928 1932 1936 1940 


Let us say that we have removed a quadric trend, obtaining residuals 
Xo.1°, which are to be investigated for periodicity. We shall use a peri- 
odogram tehnique with two modifications. The usual periodogram is 
as shown in the dot line of Chart II. The correlation ratio squared, 
n’, is plotted for successively greater periods, 7. The argument is that 
in the neighborhood of that period yielding the largest »? will be found 
the actual period of the data. For the argument to be precise it is neces- 
sary that a probability statement be attached to the outcome. Thus the 
essential modification recommended is that instead of an ordinate 7’, 
we employ the ordinate (1—P), the P being that from a variance ratio, 
the numerator variance being that of the means of the classes into 
which the data have been grouped when items of the same phase for 
the period 7 constitute a group, and the denominator variance is the 
customary residual, or error, variance. The second modification gives 














322 AMERICAN STATISTICAL ASSOCIATION: 


an approximation to this first while at the same time it gives a quantita- 
tive statement of the square of the correlation between time treated as 
a periodical variable of period T and the function, Xo.1:°, being investi- 
gated. This second modification is to plot e*,the correlation ratio squared 
corrected for fineness of grouping,' instead of (1—P), with which it is 





CHART II 
PERIODOGRAM 


TY, E* SCALE 
P SCALE 





8 io = «2 4 i6 18 
PERIOD IN YEARS 


2 4 6 


highly correlated, and instead of the usual n?, from which it diverges 
considerably because of the systematic error inherent in 7”. The illus- 
tration which follows will elucidate the relationships between (1—P), 
é@ and 7’. 

Let there be N equally time-spaced observations Xo. Let a quadric 
trend be removed, giving N residuals Xo.1.* having (N —3) degrees of 
freedom. Let the period under investigation be T of the equally spaced 
time intervals. Of necessity 1<7<(N—2) and the practical limits of 
utility of 7, fractional or integral, may be taken such that 2S$7TS 
(N —3). If a linear, instead of quadric, trend is removed 2373 (N-—2), 
and if a zero trend, equation (1), is involved then 2S 7S(N-—1). If 
N is exactly divisible by 7, there are T classes each having N/T meas- 
ures, Xo.1:?, of the same phase in it. If N is not exactly divisible by 7’, 
we have a variable number of measures from phase to phase and the 
number of classes will be some number, which we will call k, greater 

1 Truman L. Kelley, “An Unbiased Correlation Ratio Measure,” Proceedings of the National Acad- 


emy of Sciences, Vol. 21, No. 9. September 1935, pp. 554-559. Also discussed by C. C. Peters and 
W. VanVoorhis, Statistical Procedures and their Mathematical Bases, 1940, Ch. XI and XII. 











ee a 














ee 








-Tue EvIpENCE FOR PERIODICITY IN SHORT TimME SERIES 323 


than N/T. Let the means of the Xo? values in these classes be 
Moa, Mo,---, Mor. The variance of these means, appropriately 
weighted with the numbers of cases in each class, is designated V(Mox). 
The mean for each class is independent of that for every other class ex- 
cept that the mean of the means is equal to zero, so that there are just 
(k—1) degrees of freedom. Letting 7 take all values from a to k, we can 
express the variable Xo.1:* as equal to the sum of two independent parts, 


Xo? = Moi + Xo.12: (19) 
(N — 3) = (kK-—1) + (N —2-) (20) 
Vous® = V(Mox) + Vo.r’s (21) 


V(Mox)/(kK-1) _ [V(Mox)|(N — 2 — k) 
Vouts/(N -2—k) [Vou® — V(Mox)](k — 1) 


As P from this variance ratio is less and less than .5, there is greater 
and greater evidence that a period in the neighborhood of period T 
exists. The usual procedures for testing periods which are fractional 
introduces no complication if proper allowance is made for the decrease 
in N because certain Xo values are not used. 

The computations are quite simple, even that of P from F, if Pear- 
son’s Tables of the Incomplete Beta Function (1934) are used. The illus- 
tration following suggests that «? may be used in lieu of (1—P). It also 
has the advantage of being an unbiased measure of the strength of the 
correlation maintaining. Chart II shows the relationship of (1—P), é, 
and 7”, for a certain problem. 

The data chosen to illustrate the procedure described are the number 
of births in the United States for the years 1920 to 1940. The series is 
short, N =21. A trend of uncertain nature is present. The likelihood of 
periodicity is sufficiently small to lead to borderline issues and critical 
tests. The Table herewith gives the basic data and certain derived data 
used for testing for different periods. 





(22) 


PF a@—1(w-2-%) _ 


The columns in order provide: 
Column 1: Year 
Column 2: Number of births 
Column 3: Xo, number estimated by (23) 
Column 4: X0.1:?, residual deviations to be tested for periodicity. 
Column 5: Column 4 values opposite a are in the same phase and 
thus constitute one class. Values opposite b are in the 
same, and a different, phase and constitute the second 
class, when the period being investigated is 2 years. 
Column 6, 7, ... , 21: are similar for periods of 3, 4,... , 18 years. 
At the feet of columns 5, 6, ..., 21 are recorded, in order, 








AMERICAN STATISTICAL ASSOCIATION: 


4 


= 
N 
or) 


"CLP a ‘Stel ‘apunwyy 


y p140 4 :201n0g 



































L26°—| £82" bI8" 193° Z10° LLY’ 8ho" 8s" o20°—| 8£2°—| TEE°—| LEE°—| 820° orl’ 
vI8" gsr" 6) 62h" 61g" 163° 16° 8Lt° og" 6F9° OSL" L18° 6h" 289° 
065° oe! OFZ '9 | B9F'T | OIO'T | B9E°S | THST | O6E'T | 896° soc" | 09E" brs" POLL | 6EF° 
£68" 026° 696° Les" StL° 928° 802° ¢99 68P° ole 981 601° 862° 4 
2 P a f 6 y ? £ D a | a 6 2 | D 
q a P a j 6 | y ' c q p j q | @ 
D q 2 p ? f | 6 y 1 D a a D P 
4 D q a P } f 5 y : q pP f a 
b b D q 2 P a f 6 y D 2 a q 
d d d D q F) p ) f 6 y q p D 
0 0 0 0 D q 2 p , g 4 @ D a ? 
u u u u u D q F) » i -# f 6 q | Pp 
ML us uw uw ub ub D q | S) | p 4 f D o) 
1 l 1 1 1 1 1 ® q ° P ° f q 
4 4 4 4 4 4 4 7 ® q ° P > | ® 
t c c t c C i i t D q 2 p | a 
$ § $ $ $ $ § § $ $ sd q ° | P 
Y Y Y y y y 4 4 y 4 y D q | ? 
6 6 6 6 6 6 6 6 6 6 6 6 D | q 
f f f f f f f f f f f f f | o 
a a a ] a a a , a a a a a , 
P P P P P P P P P P P P P P 
2 2 a 9 F) 2 9 F) r) a 2 9 2 2 
q q q q q q q q q q q q q q 
D D D D D D D D D D D D D D 
8I LI 9 | ot | al | al It | or | 6 8 L 9 | g 

















| 
| 





€81°—| $00" 
£26" | 80F 
ele’ | Zb0'T 
#10" | SII 
D M) 
p q 
a D 
q 9 
D q 
Pp D 
? a 
q q 
D D 
Pp 2 
v q 
q D 
D | i] 
p | 4 
a] D 
q 2 
D q 
Pp D 
9 M] 
q q 
D D 





¥S0°— 
78" 


el 











=? 
=d 
= A-Z-Ne—D 
=;4 
6611 GOES | 66E09EZ OF6I 
£02 ESb2z | SgscgzZ 6 
80F Z9PZZ | 2969822 8 
ZOb — | SEZ | LEELOT L 
226 — | OLEZZ | OGLEFIZ 9 
OIL — | L9ZZZ | SOTSSIZ ¢ 
Ish — | LUIZ | 9EIL9TZ ¥ 
LEII— | 6F6IZ | ZEZT8OZ £ 
£66 — | S8L1Z | ZhOFL0Z z 
SE — | O8FIZ | OPLZTIZ I 
oss O61TZ | SS6E0%Z Of6T 
Les 29802 | 0266912 6 
gest 96402 | 6PIEEZZ 8 
S821 £6002 | 9ESLETZ L 
T60I— | 29961 | 8909¢8T 9 
S8f — | PLIGI | O888Z8T g 
8h9 SS98T | FISOE6I ¥ 
6LT — | SOI8T | 9F9Z6LT £ 
GtZ PIGLT | TI6hLLT z 
892 C8891 | T9ZPILI I 
OfII— | 6IZ9T | FLE80ST 0261 
aad = SqvNi_ | 1804 
zit °y ey "xX 'X 











OF61-0261 SALVLS GALINA AHL 


NI SHLUIG 











TT eetrcnememnnen 








- THE EvIDENCE FOR PERIODICITY IN SHORT TIME SERIES 
n°, F 1) ~w-2-k), P corresponding, and e?. 
We first investigate trend. X»=number of births per year. X,=date 
and for convenience we let 2; = X,—1930. We find: 
M, = 2050140, and V,» = 4617948 X 10 
Xo = 2050140 + 309292, 
To? = .75956. 


To test the significance of the deviation of 30929 from zero we com- 
pute 
Vorn?(N — 2) 
Fi = ~ - = 60.022 
Vol — roi”) 
which yields a very small P. We next fit a quadric regression line, ob- 
taining 





X, 2118995 + 309292, — 1877.92," (23) 
roar = 84113 
(Vor?o.2 — Voro?)(N — 3) 


Fy 48 = x = : = 9.2419 
V.(1 — r*o.11") 


yielding P=.048. Having odds of 952 to 48 that —1877.9 is not a 
chance deviation from zero we take (23) as the trend line and compute 
residuals 





Xo.11" — Xo = Xo 


as recorded in column 4 of the Table herewith. 
We have computed a cubic regression line, not as a part of the pro- 
cedure of determining periodicity, but because it is interesting to com- 

pare such a line with the evidence available as to periodicity. 

Xo = 2118994 + 111492, — 1877.82,2 + 300.612;' 

ro 1171? = .89917 
(Vor?o.°1? — Vor?o.u?)(N — 4) 
Fiar = - ~— = 9.7856 
Vo(1 — 1o.1:°1*) 

yielding P=.045. This establishes the non-chance nature of the coeffi- 
cient 300.61. We will later note that we do not establish periodicity with 
the certainty that we here establish the fitness of a cubic regression line. 
To test for the existence of a two-year period the residuals of column 
four are to be grouped into two classes, a and b as indicated in column 
five. The a class has 11 measures and yields a mean, Mo, and the 10 


II 
















































326 AMERICAN STATISTICAL ASSOCIATION: 


measures of class b yield a mean, Mo». Weighting these means 11 and 10 
respectively and computing their variance, we obtain V(Mo), the sub- 
script 2 being k, the number of classes involved. Analyzing variance as 
indicated in (21) and employing (22) we have: F1,:;=.073, yielding 
P=.825, which, being >.5, yields no evidence whatever that a period 
of two years exists. Similar computations for 7=3, T=4,---, T=18 
have been made and are as recorded at the feet of the appropriate col- 
umns of the table. The smallest P is .152, found when 7 =16. Thus the 
odds are about five to one that a period in the neighborhood of 16 years 
exists. The F for this case is an F'3.3. We have but three degrees of free- 
dom available in the error variance. Should a real period of 16 years 
exist it is not surprising that data covering but 21 years is unable to 
establish it with satisfactory certainty. 

The advantages of employing (1—P) in lieu of 7’, in plotting and in- 
terpreting a periodogram wherein the number of degrees of freesdom is 
small, follow from the argument to this point and are exemplified in 
Chart IT. 

To a degree the advantages of (1—P) are also inherent in e’, as is in- 
dicated by the periodogram curve based upon e*. The earlier derivation 
by the writer of ¢ gave the relationship 





e@=l1— 


(1 — 7) (24) 


but this was postulated upon using a zero order regression equation, 
(1). When a linear trend is taken out the relationship is 


eT eS ae (25) 
N-1-k 
If a quadric trend is removed we have 
Pe ee. sa (26) 
N-2-k 


and so forth for the removal of trends of higher parabolic order. The 
values of é given in the table were computed by (26). The median é 
value of the table is .012 which differs but slightly from .000, the value 
to be expected if there is no period in the data. 

The periodogram analysis here discussed may be thought of as ter- 
minating the problem only in case no period is discovered. If the analy- 
sis does establish one or more periods, their further specification could 
be accomplished by fitting, say, a Fourier series, by a method which is 
appropriate when the time span of the data is not an exact multiple of 
the period, or periods, 





1 
3 
f 








—~— ms 





CORRELATION CONCEPTS AND THE 
DOOLITTLE METHOD 


By Dup.ey J. CowpENn 
University of North Carolina 


well adapted to the Doolittle method, outlined by Paul 8. Dwyer 
in the December, 1942, issue of this JouRNAL, is that the coefficient of 
determination r? is the proportion of variation in the dependent varia- 
ble that has been explained by variations in the independent variable 
or variables; or it is the ratio of the explained variation to the total 
variation. The chief purpose of this article is to indicate how the Doo- 
little method fits in with this concept. 

In this paper we shall use z= X —X to indicate a deviation from the 
mean, and following Dwyer, the highest subscript will be assigned to 
the dependent variable. Where two subscripts follow the letter z, the 
first one will refer to the dependent variable, and the meaning of the 
symbol will be a deviation from the mean of the explained portion of 
the dependent variable, i.e., the value of a deviation from the mean as 
computed from the estimating equation involving the independent 
variable indicated. If there are two or more independent variables in the 
equation, the subscripts referring to those variables will be enclosed in 
parentheses. A subscript or subscripts following a decimal point will 
always indicate that the variable, or those variables, have been “held 
constant.” For a three variable problem, the various coefficients of de- 
termination involving the dependent variable, and their conceptual 
formulas are given below. 


N MONG THE MANY CONCEPTs of correlation, one which is especially 


Simple correlation 
- rr*51 . Lr*42 
ra ==: ra = 
rz; rx; 








Multiple correlation 
22312) 





2 ime 
r’sa3) = 
rx’; 


Partial correlation 








‘ 

- 2232.1 , 22*31.2 

ee” ae 7 ras “so 
22*3.1 a2"3.2 


Reference to the accompanying diagram serves to visualize the con- 
cept. The total variation =z’; is represented by the middle square in the 


327 






















328 AMERICAN STATISTICAL ASSOCIATION: 


bottom row. In the two upper corners are shown the gross amounts of 
variation Y2z?3,; and L232 explained by the two independent variables. 
Because of factors common to these two variables there is an overlap- 
ping in the explained areas, which is indicated by the middle rectangle 
at the top of the diagram. In the central square of the diagram 22*3,12) 


DIAGRAMMATIC REPRESENTATION OF TYPES GF VARIATION INVOLVED IN 
SIMPLE, MULTIPLE, AND PARTIAL CORRELATION 


eececeeccord 













































































represents the amount of variation explained by the two independent 
variables taken together. To the left of that polygon is shown the addi- 
tional amount of variation, 22z?3.,, accounted for by variable 2 over and 
above the gross amount explained by variable 1; while to the right is 
shown the net variation 223; explained by variable 1. The diagram 
indicates that the amount of variation explained by variables 1 and 2 
taken together is made up of: 





— Oe lS ao 














- CORRELATION CONCEPTS AND THE DOooLITTLE METHOD 


(a) gross explained by 1 plus net explained by 2; or 

(b) gross explained by 2 plus net explained by 1; or 

(c) net explained by 1 plus net explained by 2 plus amount explained 

by factors common to 1 and 2. 

The amount of unexplained variation remaining after using variable 1 
alone 22?3., and therefore the amount which it is sought to explain by 
the net contribution of variable 2, is shown in the lower left hand corner 
of the diagram. Finally, the amount still to be explained after variable 
2 alone has been used 22?3.2 is shown in the lower right hand corner. 
These two amounts are the denominators for the partial correlation 
coefficients 7?32., and r*s:.2 respectively. 

Since each of the nine large squares is composed of 100 small squares 
it is easy to compute the area of the different polygons. If this is done it 
will be seen that the values of the various coefficients of determination 
are as follows: 


64 36 
51 =-— = .64 1239 = — = 36 
100 100 
; 76 
r*3a2) = 100 => .76 
12 40 
17301 = = 3008 31.9 = = .625. 
3 64 


We shall now turn our attention to computational methods, using 
four variables as an illustration. First, to make estimates of variable 4 
from variable 1, we may use the estimating equation 2% =ba%, in which 


LIX 
s . We could then obtain Y2*% by squaring the N values of 





r2*; 
2 and summing them. The Doolittle solution accomplishes the same 
result by the expression 227g = ba D242. 

The amount of variation in variable 4 explained by variable 2, with 
variable 1 held constant, could literally be obtained by first adjusting 
variables 4 and 2 for variable 1 and then proceeding in a manner similar 
to that in the preceding paragraph. Thus, we could obtain 22°; as 
follows: 


(1) Compute the N values of 24.1.=24—24, and the N values of 
2.1 = T2— 21; 
(2) Compute the N values of 24:72. and sum them; 


(3) Square the N values of z2., and sum them; 











330 AMERICAN STATISTICAL ASSOCIATION: 


22%4.1%2.1 
(4) Compute be.1.=———_ ; 
22*2.1 

(5) Compute 2242.1 = bee. 1224 .1%2.1. 
Another method, which serves further to clarify our concept of correla- 
tion, is based on the principles: 


Explained variation plus unexplained variation equals total variation; 
Explained covariation plus unexplained covariation equals total covariation. 


By the first principle, as is well known, 
229.3 = Tr", — Tr; 
and by the second, 
2%4.1%21 = VMyX2 — TIuTa. 


Thus we could compute the N values of x4: and the N values of za, take 
the product of each of the N pairs, sum them, and subtract the sum 
from 22472. However, just as 


Dro, = byDr271, and 
L224) = bg Dry; 80 
LLuta = byDx421, OF 
by Drer. 
The Doolittle solution employs whichever of these two methods for 


L272 that the computer prefers. 
By extension of the method indicated above, we obtain, by the Doo- 


little solution: 
2%4.12%3.12 = TXqX3 — TX412)T3.02) 
= L2%yl3 — 2Kutn — [X42.1792.1 
= LryX3 — by Tzqx, — b32127%41%21, OF 
Lrg — Og Trex, — 42.1 273.122.13 


224.12%3.12 


bys.12 = ae: 
. 
22*3.12 


PF y~2 om > 
Lxrr%43.12 = b43.12224.1273.12- 


The total amount of variation explained by several variables is ob- 
tained by adding to the gross amount explained by variable 1 the suc- 


ON: 


la- 


or 








ie 


PRR Ariens t 





- CORRELATION CONCEPTS AND THE DOOLITTLE METHOD 331 


cessive additional amounts explained by other variables as they are 
included successively in the estimating equation. Thus: 
Lr%qc123) = TI%q + Lr%e + LT%s.12. 

The multiple coefficient of determination 

, 2x" 4 123) 

4123) = : 

2x", 

and the partial coefficient of determination 

52743.12 52743.12 


rea = Ty? Py? 
a2"4.12 wg — 22"4(12) 





The Doolittle solution may also be used, though somewhat less con- 
veniently, to obtain the above numerator by the expression 


° 9 > 
22°32 = 224 (123) — 224,12), where 


Dx%412) = Og. 22Xi + berFr2, and 
22x64 (123) = Dg 2322421 + Dge.13224X2 + b43.12 D247. 
TABLE I 


VALUES NEEDED FOR SIMPLE, MULTIPLE, AND PARTIAL CORRELATION 
AS OBTAINED BY DOOLITTLE SOLUTION 




























































































| | : 
lola] @o | « ! i) =| Pl [3] [4] 
(1) rr, | Lrs721 Lr21 Trex —1 
@) | | za | Deere Zr || 0 _ 
(3) | rk be a || 0 0 -1 
(4) | | Erk, | 0 0 0 ~$ 
I =r, bak Te a Trt Trex | -1 
. 3 8 | ba ba: ba | —1/2z%, 
II | rr: | 221.123.1 LiFe |) bs -1 
II’ a bass bass ba /Zr%s.1 —1/Er%s.: 
III Sz%s.e | Se 12Ta.1s bus bars =i 
III’ | 1 bas | bar.s/Dz%s 12 bes 1 /Dr%s 12 —1/Z2r%s 12 
= ‘one | 
IV | Tr ase Dara Das as Daria —1 
IV’ | Bar se/Ez% 128 | Bes.1s/Er% 12a | Bea as /DT% 1 —1/2%_ 133 
=I i] 
i | 22%, rr*n rr'n rr | 1/2zr%; 
ii Tr*s.1 Dr%ss1 Tr%X1 i] b%3,/Ezr%:1 1/Ers.1 
iii Tr*s 1s Tr%s.12 l b% 2/Er%1s | bts /Ez% 12 1/Z%1 
iv Tr%,_ 128 ii bry sa/Dzr% 1 bs 13/D2T%_ 198 bs us Try is 1/22%_ 12 
7 
z | zr; Tr*n Tr*n Prva i /=zr 
Lil Zzr% | Sr%«:1) Ert%q) |] 1/Er%.. /Zr%s1 
Zii rr*s Tzr%4(122) | 1/Zr*; 29 1/Z2*s.1 1/Zr% 1 
Ziv | =r 1/Zzr*; 20 1/Zz*s, 1% 1/Z2z%s 12 1/Zz% 120 











332 AMERICAN STATISTICAL ASSOCIATION: 


If the Doolittle solution is carried out as suggested here, in deviation 
form rather than in standard form, one is spared the labor of computing 
the different zero-order coefficients, as well as of converting the 6 
coefficients into b coefficients. On the other hand, one is deprived of the 
preliminary information which the different zero-order coefficients 
provide. Also, it will often be found desirable to code the data so that 
the various sums of squares and sums of products do not vary too 
greatly in magnitude. If this is not done one may run into numbers 


TABLE II 
ALTERNATE NOTATIONS FOR RIGHT SIDE OF DOOLITTLE SOLUTION 













































































(1) [2] | [3] | [4] 
I -1 
I’ —1/=2; 
II ba —1 
ig Laren /Trrrzrs,1 —1/22%3.1 
III bus basi at | 
III’ Dra 121,2/E7%; sEr%s 13 Dzs1Fa.1/E7%2 T7412 —1/22r%s,12 
IV bar.2s basis Basis -1 
Iv’ LXa 2981 23/T 741 eT T% 128 | Lz 1F%2,1/Dzr*s 13sD2% 123 | LXe.12%s,12/L2%12E7T% 128 | —1/T2r%. 12 
i 1/22; 
ii T2*n/lr°4Tzrs,1 1 ‘L251 
iii Lan 2/Sr%1 sEr%s, 12 Dag, 1 /Dr%s 1Tz%s 12 1/E2%3.1: 
iv Tr%123/D2%1 sD T% 128 Tr%3,13/D2%s, 1D T%s 128 Sr%g.12/L7%s_wTT% is 1/ED2% 12 
zi 1/22; 
Zii Er*,/Tzrrtzs1 1/2243.1 
Liii E2r*3,3/E2r%1 sSzr*s. x%9.1/Lx%s 1S7%s, 12 1/22*s.13 
Ziv Tx% 23/ D741 eS T% 19 Ex, 13/Dxr%,wTT% 198 2r%e12/LT%s 12D T%_ 123 1/22r% 138 

| 
1 

rs, 1 

r7y 3 r75301 1 | 

7% a8 T4319 rs 19 1 

















that exceed the capacity of the calculating machine. Such coding can 
be done by multiplying variables, where necessary, by appropriate 
powers of 10; and such coding may be done after the sums of squares 
and sums of products have been obtained. 

Table I gives the Doolittle solution in symbolic form. The eight sec- 
tions to the table are set off by double lines. The upper left section 
shows the sums of squares and sums of products of deviations from 
the mean. In the upper right section is the identity matrix, with re- 
versed signs. (The signs are reversed in order to obtain the correct sign 
for the different coefficients of estimation.) The next to the top section 


on 
ng 


he 
its 
at 
00 
Ts 


2S 


- 


as 


n 
n 











-CORRELATION CONCEPTS AND THE DOOLITTLE METHOD 333 


contains the forward solution. The next to the bottom section contains 
products of pairs of items in corresponding cells of the forward solution. 
The bottom section contains cumulative sums of the items in the same 
column in the section immediately above. 

The left hand side of the table gives us entries by which we may easily 
compute, among other coefficients, the following: ra, r%e.1, 743.12, 
r42), T%a23). Corresponding standard errors of estimate, and variance 
ratios for testing significance, may likewise be computed. 

The symbols recorded in the right-hand side of Table I illustrate a 
concept of correlation somewhat different from the concept in which we 
are mainly interested. A coefficient of simple or partial correlation 
may be thought of as the slope of the estimating coefficient adjusted for 
differences in variability of two variables. Thus: 











C1 71.2 71.234 
To = ber — = be —> and raos = bars 
02 02.1 04.123 
Therefore, 
: , Dr*1.2 Lr"; 234 
r?4, = b 21 : » and ri o3 = 5741.23 —.. 
22791 Dx", 193 


It is this method by which a new set of coefficients of partial determina- 
tion are obtained from this side of the table. All that needs be done is to 
divide an entry in a cell in the next to the bottom section by the entry 
in the cell in corresponding position in the bottom section. 

The derivation of the sums of products in the lower right-hand sec- 
tion of Table I may not be obvious; so that for 1/22*:.23 will be given 
as an illustrating procedure, using concepts developed earlier in this 


paper. 


























1 b7a Tre + br Yz79.1 + bi trer 
22, i 222.1 - Lz; T7r*1.2 _ Pz, 27"1.2 
L2r%9.1 + Tr*n rr’, 1 
on on om 
Y2r*1T71.2 Lr» Tz*1.2 22x12 
1 b*s1.2 Dx7s.12 + 7s1.22271.2 Yx73.12 + T2r%s1.2 
r2"1.2 225.19 . Lxr*) 2273.19 : Lx", 222*s 12 
rz, .:* rx, E2*s.2 
‘ Lx, 2D7*3.12 Lx TT, 2F2"s.12 


* See Chart for visual evidence that 22%, 1:-+ 22% .»=Z2r%,s. 
Algebraically we have 
Lr%s1s + Tr%n.s = Dr%s — Sz%( 22) + Lz%11) — Drs = Sz — Trt = Sz. 2. 











334 AMERICAN STATISTICAL ASSOCIATION: 


Dx*32272 3 1 


Lx*7slxs sLlT*1.23 2r*1.23 





Table II shows the right-hand section of Table I (except that the 
identity matrix is omitted to avoid repetition and a new section is ap- 
pended at the bottom), but with symbols illustrating the concept of 
correlation based on the ratio of explained to total variation. The de- 
rivation of these symbols and their identity with those of Table I is 
rather obvious. Although the entries appear somewhat more compli- 
cated, it should be noticed that the denominators of the fractions in 
celis of corresponding positions in the two middle sections of the table 
are identical. The numerators of these fractions are respectively the 
net explained variation and the variation remaining to be explained, 
which are the values needed for the various coefficients of partial de- 
termination shown in the bottom section. 











OVERESTIMATION OF MEAN SQUARES BY THE 
METHOD OF EXPECTED NUMBERS* 


By R. E. Comstockt 
University of Minnesota 


HE METHOD of expected numbers for analysis of the variance of 

data with two or more criteria of classification and disproportionate 
numbers in the sub-classes was introduced by Snedecor.' He frankly 
stated that the methods? of fitting constants and of weighted squares 
of means were more orthodox in theory. However, he pointed out that 
the new method had a wider range of applicability than either of those 
and that it required less time for computation than the method of fitting 
constants. 

Snedecor and Cox* compared the three methods on a series of small 
samples drawn randomly from a specified population. They stated that 
differences among the results obtained were trivial with the possible 
exception that the mean squares for main effects computed by the 
method of weighted squares of means were usually somewhat smaller 
than the corresponding values obtained by the other two methods. 
They concluded that the method of expected numbers can safely be 
applied to data in which the sub-class numbers do not deviate signifi- 
cantly from proportionality as judged by chi-square. The only case in 
their experience, they said, in which the results were not in essential 
agreement with those of the method of fitting constants was one in 
which the postulation of proportional numbers in the sub-classes of the 
population was untenable and for which, in fact, the test for agreement 
of the observed sub-class numbers to proportionality yielded a highly 
significant value of chi-square. 

Recently the author had occasion to compare the methods of ex- 
pected numbers and of fitting constants in an analysis of sex differences 
in the growth rate of swine. The data used were exactly the same in 
both cases; no data were included in the analysis by fitting constants 
that would have involved sub-classes containing no actual observa- 
tions. Data from seven non-interbreeding lines of one breed were in- 
volved. A separate analysis was made of the data from each line; they 


* Paper No. 2083, Scientific Journal Series, Minnesota Agricultural Experiment Station. 

t Now at North Carolina State College. 

1 George W. Snedecor, “The Method of Expected Numbers for Tables of Multiple Classification 
with Disproportionate Subclass Numbers,” this JouRNAL, 29: 389-393, 1934. 

?F, Yates, “The Analysis of Multiple Classifications with Unequal Numbers in the Different 
Classes,” this JourNaL, 29: 51-66, 1934. 

* George W. Snedecor and Gertrude M. Cox, “Disproportionate Subclass Numbers in Tables of 
Multiple Classification,” Jowa Experiment Station Research Bulletin 180, 1935. 


335 











336 AMERICAN STATISTICAL ASSOCIATION: 


were made in a form to give estimates of variance arising from the 
sources indicated below. 


Source of Variation 
Sex 
Years 
Sires within years 
Litters within sires 
Sex X years 
Sex Xsires within years 
Sex X litters within sires 
Pigs within sub-classes 


The seven analyses were then aggregated in a single analysis in which 
variances arising from line differences and the interaction of sex and 
lines were also estimated. 

The method of expected numbers was computed as described by 
Snedecor with the exception that the expected sub-class numbers were 
based on a postulate of equal frequency of the two sexes instead of on 
the actual marginal totals for sexes. This reduced the labor of com- 
putation since it yielded numbers involving no fractions other than 
one-half. Tests for departure from the expected numbers so obtained 
produced no significant values of chi-square. The sex frequencies were 
very close to equality in six of the seven lines. (Table I.) It should be 
noted that the above departure from usual procedure does not change 
the estimation of the sex differences. 


TABLE I 
THE TOTAL NUMBERS OF EACH SEX IN THE VARIOUS LINES 


Number of pigs 


Line Male Female 

1 102 100 

2 143 114 

3 137 127 

4 125 125 

5 160 159 

6 41 37 

7 110 122 
Total 818 784 


The major difference between the results of the two methods and 
the only one consistent in direction was in the estimates of the sex-litter 
interaction. The mean squares for this interaction (both methods) are 
recorded in Table II along with the mean squares for the differences 
within sub-classes. In every case the interaction mean square is highest 


he 


ere 2 MD Slate 











-OVERESTIMATION OF MEAN SQUARES 337 


when calculated by the method of expected numbers. Considering the 
single analysis of the data from all lines taken together the results of 
that method accepted at face value indicate a probability of less than 
0.1 that the sex-litter interaction observed was entirely the consequence 
of random errors. The work on these data was not begun with the 
intention of comparing analytical procedures but the conclusions that 


TABLE II 


MEAN SQUARES FOR SEX-LITTER INTERACTION OBTAINED 
BY DIFFERENT METHODS OF ANALYSIS 

















Interaction mean square Degrees of freedom 
Line Within sub-classes 
Expected Fitting mean square Within 
numbers constants Interaction sub-classes 
1 .0307 -0273 .0136 27 134 
2 .0301 .0217 .0241 34 167 
3 .0235 -0197 .0162 32 186 
4 -0305 -0233 .0132 26 182 
5 -0161 -0142 .0183 29 241 
6 -0256 -0161 .0289 8 52 
7 .0239 .0219 .0249 33 150 
1-7 .0257 .0210 .0188 189 1112 





might have been based on this particular result of the method of ex- 
pected numbers were considered sufficiently important to justify re- 
analysis by another method. The probability of the amount of inter- 
action present arising from random errors is greater than .05 according 
to the method of fitting constants. 

Yates stated that the method of fitting constants provides an effi- 
cient test of interaction in a two way table. Snedecor and Cox say that 
it should be looked upon as the standard of comparison. The evidence 
indicates then that in the case being reported the sex-litter interaction 
was over estimated by the method of expected numbers. 

The sum of squares for interaction in the analysis of a two way table 
is, symbolically, =f(%;;—Z,,—Z.;+Z)* where Z,; is the mean of the sub- 
class in the i-th row and the j-th column, Z,,; is the mean‘ of the 7-th 
row, £.; the mean‘ of the j-th column, and Z the grand mean.* Assuming 
no true interaction to exist each separate squared deviation multi- 
plied by the appropriate f is an estimate of d/n o? where d is the number 
of degrees of freedom for the interaction, n the number of squared de- 
viates, and o? the random variance of individual observations. Sum- 
ming the n values and dividing by d yields an estimate of o*. If the 
value obtained has only a small probability of having arisen by chance 


‘ As estimated by the method of analysis being used. 








338 AMERICAN STATISTICAL ASSOCIATION: 


alone from the same population as the variance estimated from the 
variation within sub-classes, reasonable doubt is cast on the assumption 
of no interaction and it may be concluded that a real interaction exists. 
The appropriate f for each sub-class mean in the expression given 
in the preceding paragraph is the sub-class number (k) since the 
variance of a mean is o?/k. The procedure of substituting the expected 
number (k’) for k results in each value of f(Z;; —Z-; —Z.;-+2Z)? estimating 
k’ d 
—-— o¢’ instead of — o*. The net effect, on the average, is overestima- 
n n 


tion. The amount to be expected on this basis can be found by averaging 
k’/k over all the sub-classes. This was done for the data concerned and 
the results recorded in Table III along with the amount of overestima- 
tion indicated by comparison with the results of fitting constants. It 


TABLE III 
CALCULATED OVERESTIMATION OF SEX-LITTER INTERACTION MEAN SQUARE BY 
THE METHOD OF EXPECTED NUMBERS AND ACTUAL BY COMPARISON 
WITH THE RESULTS OF FITTING CONSTANTS 








Overestimation 








Line 
Expected Actual 

1 .151 -125 
2 - 267 .386 
3 . 265 -195 
4 . 157 -314 
5 - 164 -129 
6 .299 . 584 
7 . 203 -087 

1-7 -210 .222 





will be noted that the agreement is only fair for the separate analyses. 
There is no apparent method for judging what degree of concordance 
could reasonably be expected. However, the apparent lack of bias indi- 
cated by the remarkably close agreement in the case of the analysis of 
all the data taken together is favorable evidence for the suggested ex- 
planation of the differences in the results of the two methods. 
Following the same reasoning applied in considering the estimation 
of variance from interaction it appears that the estimates of variance 
from class differences by the method of expected numbers should also 
be too large. Assuming the class differences to be purely random this 





1 1 
estimate should accordingly approach -= (=) o? averaged over 
gly app Saye) 7 NE g 





he 
on 
ts. 
en 
he 
ed 


ng 


ug 
id 


It 


sY 


GC GB few 











-OVERESTIMATION OF MEAN SQUARES 339 


all the classes instead of o?. Because, in the case at hand, the expected 
numbers are equal for the two sub-classes of each litter, the over- 
estimation from litter differences should be the same as from the sex- 
litter interaction. This is in seeming disagreement with the results ob- 
tained; in the aggregate of the seven separate analyses the variance 
estimated from litter differences is .0600 by the method of expected 
numbers and .0569 by fitting constants. The difference is only about 5 
per cent of the latter figure. However, to the extent that the deviations 
were real, substitution of expected for actual sub-class numbers is im- 
material. It is only the random portions of the squared deviations of the 
class means that vary in average magnitude with the numbers of ob- 
servations on which the means of the sub-classes of a class are based. 
The correct expectation then is that the variance estimated from litter 
differences obtained from the method of expected numbers will exceed 
that from fitting constants by 21 per cent of the random variance, 
.0188. This amount is .0039 compared to the actual difference, .0031. 

It should be noted that the amount of overestimation of variance 
from class differences or interaction decreases rather rapidly as the 
sub-class numbers increase in size so long as they deviate from propor- 
tionality only as a result of randomness. The case herein reported is 
about as extreme as would occur within that limitation.’ However, 
there will be in the application of the method to the general run of data 
a certain frequency, perhaps quite small, of instances like the one here- 
in reported in which this overestimation will, if not recognized, have a 
bearing on conclusions drawn. 

While the existence of real class differences or real interaction reduces 
the percentage overestimation of the respective mean squares the mat- 
ter is not critical under those circumstances anyway since the conclu- 
sion that class differences or interactions are real is not then in error. 


SUMMARY 


The actual sub-class numbers in the data concerned did not deviate 
significantly, as judged by chi-square, from the expected numbers. 
However, the method of expected numbers by comparison with that of 
fitting constants overestimated the mean square for the sex-litter inter- 
action by enough to have a bearing on interpretation of the results and 
the mean squares for litters by a slight amount. The differences in re- 


5 Snedecor stated “The change from actual to expected numbers is merely a matter of weighting, 
usually of minor consequence unless carried to extremes.” Certainly this is an extreme case. However, in 
his next sentence Snedecor implied that adjustments great enough to affect the results would not oceur 
when the chi-square test did not indicate significant deviation of the observed numbers from propor- 
tionality. 











340 AMERICAN STATISTICAL ASSOCIATION: 


sults of the two methods was interpreted to be the consequence of 
shifts in weighting squared deviations associated with the shift from 
observed to expected numbers. This led to expected results in reason- 
able agreement with those observed. 

The overestimation of mean squares reported should not be inter- 
preted as a serious defect of the method of expected numbers. When the 
condition laid down by Snedecor for the fit of expected numbers to 
proportionality is met, the maximum overestimation of class and inter- 
action mean squares is not large and the type of data for which a very 
close approach to this maximum occurs should not be overly frequent. 
Certainly the advantages cited by Snedecor when contrasted with the 
defect discussed commend the method for wide usage. What has been 
presented should actually assist in judging in critical cases whether 
bias from the source considered is important, thereby permitting in- 
creased rather than decreased confidence in the usefulness of the meth- 


od. 


n- 


er- 
he 


r= 


t. 
he 
on 
er 
Ne 


h- 

















UTILITY OF STATISTICAL METHOD IN AERODYNAMICS 


By Hersert G. SMITH 


S IS UNDOUBTEDLY the case in many a new field of endeavor, it is 
A quite natural, perhaps, that aerodynamics has borrowed several 
of its physical laws from other sources. Most of these borrowed laws, 
as well as many of those developed wholly or largely in this field, ad- 
mittedly are empirical in character in that they have been established 
in experimental surroundings, generalizing from relatively limited 
facts. And, just as surveyors readily acknowledge the necessarily ap- 
proximate nature of any measurement, so are most aerodynamic engi- 
neers quick to concede that these physical laws are not established 
firmly beyond question. 

Published materials dealing with aerodynamics, particularly the 
numerous Reports, Technical Memoranda and Technical Notes which 
have been issued by the National Advisory Committee on Aeronautics, 
indicate clearly that the wind tunnel has been the most productive 
source of data from which some of the more recent general physical 
laws of aerodynamics have tentatively been established, and it is 
equally clear that applications of the calculus have been among the 
more frequently used tools for analyzing and developing such data. 

Surprising as it may appear to those familiar with statistical methods 
of analysis, and with the wide applicability of these methods, observa- 
tion indicates that beyond the occasional use of a few of the more 
elementary methods for compiling and arranging data, and for ob- 
taining averages, these methods are practically unused in aerodynamic 
circles, either for establishing or testing laws or for predicting perform- 
ance. One of statistical method’s most powerful devices, correlation, is 
indicated to have been used little if at all for such purposes. One reason 
for this lack of use of the excellent analytic tools of statistical methods is 
the fact that many engineers seem either unaware of their existence 
and availability or are not familiar with their applicability; another 
possible reason is that because of the relatively few data (as few as four 
or five observations, in some instances) from which empirical laws are 
established, statistical methods are admittedly not always appropriate, 
being more suitable, of course, for handling and treating masses of data. 

Empirical laws and actual-performance data.—Even aerodynamic en- 
gineers, themselves, are not always fully satisfied with results obtained 
by the empirical laws they employ. One capable aerodynamist informed 
the writer that every aircraft has its own particular characteristics 
which must be ascertained in flight. Differing somewhat, another engi- 


341 











342 AMERICAN STATISTICAL ASSOCIATION: 


neer stated rather gloatingly that his company has the reputation of 
building planes conforming more closely to pre-fabrication predictions 
than is the general case. Thus, there appears ample justification for 
questioning some of the fundamental aerodynamic laws—particularly 
in the terms in which most commonly stated—and for suggesting other, 
supplementary methods for research and analysis. 

One example of a fundamental law which may legitimately be ques- 
tioned in the form usually stated is that established by Daniel Bernoulli 
(1700-1782), now popularly adapted to airfoils: As velocity of a given 
volume of a fluid increases, pressures decrease. But aviation engineers 
with whom the writer has discussed this law invariably hasten to 
qualify the general, popular version of it to the following: For a flow of a 
given volume of a fluid, total pressure remains constant regardless of 
velocity; as velocity increases, however, dynamic pressures increase, 
and static pressures decrease. 

Having been advised by several competent aerodynamic authorities 
that most published actual-performance data on airplanes are reason- 
ably comparable, representative and reliable, the writer has made use 
of these generally available (except on recent ships) and relatively 
numerous data, rather than the more limited and less available wind 
tunnel data, in making certain analyses by the statistical methods of 
correlation. (These methods are, of course, quite as suitable for use with 
laboratory data as with actual-performance data.) The general ap- 
plicability, as illustrated by the results of two such analyses reported 
here, may be of interest to statisticians, indicating, as they do, a po- 
tentially fertile field; also, should these pages come to the attention of 
aerodynamic engineers, it is hoped that this discussion may prove 
thought-provoking, indicating, as it attempts to do, how statistically 
derived laws may supplement empirically derived laws to advantage. 

How lift depends on speed.—The first illustration: Consider the em- 
pirical law which, in general terms, holds that lift varies directly as the 
square of speed. To test this law, actual-performance data on 52 ran- 
domly selected and reasonably comparable military monoplanes were 
studied. Power loading! was correlated against speed and against the 
square of speed; and wing loading? was likewise correlated against speed 
and against the square of speed. The following results were obtained 3° 


Power loading XSpeed, r= —0.643 + 0.0548 
For estimating power loading from speed, 
Y =19.53 lbs. — (0.0353 Ibs. X speed) + 2.02 Ibs. 
1 Pounds of gross weight per horsepower. 


2 Pounds of gross weight per square foot of supporting area. 
* The measure of reliability employed here in all instances is the probable error, or 0.6745 of the 


standard error; 


of 
ns 
for 


rly 
er, 


es- 
li 
en 
rs 


© 


eo w= iy UY 








ta etnies nee 





-Uritity oF STATISTICAL METHOD IN AERODYNAMICS 


Power loading XSpeed squared, r= —0.604+0.0595 
For estimating power loading from speed squared, 
Y =14.85 lbs. — (0.0000617 Ibs. xSpeed squared) + 2.11 lbs. 
Wing loading XSpeed, r= +0.832 + 0.0288 
For estimating wing loading from speed, 
Y =5.52 lbs.+(0.0775 Ibs. xSpeed) + 2.48 Ibs. 
Wing loading XSpeed squared, r= +0.795+0.0344 
For estimating wing loading from speed squared, 
Y =15.69 lbs. +(0.000138 lbs. xSpeed squared) + 2.71 lbs. 


Whether lift is considered in terms of power loading or in terms of 
wing loading, an examination of the r’s and their probable errors 
clearly indicates that, contrary to empirical practice, it is somewhat 
more logical to consider lift in terms of unaffected speed than in terms 
of speed squared. In each instance, the Y obtained will serve, within 
reason, as a “law” of how, for the type of airplane considered, etc., lift 
may be expected to vary with speed or with the square of speed as the 
case may be.‘ 

Horsepower requirements.—As a second illustration of the applica- 
bility of correlation methods for testing aerodynamic hypotheses and 
establishing laws for prediction purposes, consider how horsepower re- 
quirements vary with speed. The results of this investigation, hitherto 
unpublished, are presented briefly. 

On the accompanying figure, for purposes of comparison horsepower 
per gross pound has been tallied respectively against speed, against 
speed squared and against speed cubed. The Y-curve in each section 
may prove useful for predicting how horsepower requirements depend 
on speed, speed squared or speed cubed, respectively; the A-curve may 
prove useful for predicting speed from horsepower; the B-curve, for 
predicting speed squared from horsepower; and, the C-curve for pre- 
dicting speed cubed from horsepower. The dashed lines outline the 
probable errors of the prediction curves. The performance data em- 
ployed are for the same 52 military monoplanes for which the lift 
analysis (supra) is reported.* The results: 

For Horsepower X Speed, r= +0.650 + 0.0541 

Y = (0.009735+0.0003975A) + 0.02235 

A = (136.41+1062.0Y) +36.2 
For Horsepower XSpeed squared, r= +0.644 + 0.0548 

Y = (0.06008 +0.000000734B) + 0.02250 

B= (6,710.0 +562,000.0Y) + 19,650.0 


‘ For additional details and comments on the foregoing study, see the writer's “How Lift Varies 
with Speed,” Aero Digest for February 1943. 

* Most data on current aircraft are restricted for military reasons. The data employed here ap- 
peared variously in print early in 1942, and may be said to be for ships now largely obsolete. 








344 AMERICAN STATISTICAL ASSOCIATION: 


6%L9-00S9 
6699-0629 
6¥29-0009 
666S-0SLS 
6%LS-00SS 
66%S-0S2S 
6%2S-000S" 
6669-0SLP 
6PLP-00S F 
660P-0S2> 
692 P-000% 
6662°0SLE 
6PLE-00SE 
66FS-0S2° 
6%22-000¢ 
6662-0522 
6¥L2-00S2 
6692-0522 
6922-0002 
666T=O0SLT 
6PLT-OOST 
66PT-0S2T 
692 T-O000T 
666 -OSL 
6bL ~00S 
66% -0S2 
-0 
-09T 
6S T=0S Tog 
6PT-OFT 
6ET-OcT & 
62T-02T 
6TT-OTT 
60T-00T 
66 -06 
68 -08 
“OL 
-09 
-0S 
-0F 
-0¢ 
-02 
-0T 
61P-00F 
662-088 
6LE-098D 
6S¢-08¢ & 
620-08: 
6TS-00E 
662-098 3 
612-092 
6S2-092 @ 
6£3-022 5 
12-002 § 
T-oeT 
T-09T 
T-0FT = 
T-038T 











































































(add 0,000 or 9,999) 








el 
® 
& 
o 
ba 
” 
aes 
” 


(add ,000 or ,999) 


3 
= 


HOW HORSEPOWER REQUIREMENTS VARY WITH AIRPLANE SPEED 


019- 
°18- 
e1l7- 
016- 
e15- 
el4- 
el3- 
el2- 
ell 

210- 
209- 
2e08- 
206- 
205- 
«O4- 


‘ 
- 
io] 

° 


©|.07- 


E 
2 
: 
5 
be 
5 
s 











ee 








-Uritiry oF STATISTICAL METHOD IN AERODYNAMICS 


For Horsepower X Speed cubed, r= +0.619+0.0575 
Y = (0.07618+0.000000001704C) + 0.02305 
C=(—4,737,500.0+225,000,000.0Y) + 8,410,000.0 


Here the correlation coefficients again agree closely, each being 
moderately high; yet, considered in connection with their delimiting 
measures, it is clear that for the data analyzed, it is more valid to con- 
sider horsepower requirements dependent upon speed unaffected than it 
is to consider them dependent upon speed squared, or upon speed 
cubed. In short, it is indicated here that the empirical aerodynamic 
law, “horsepower varies with the cube of speed,” must not be depended 
upon too greatly. 

The following data, also developed, may prove of incidental interest: 


Average horsepower, 52 aircraft, 0.1106 +0.0294, per gross lb. 
Average speed, 254.0+47.7 M.P.H. 

Average speed squared, 68,900 + 25,700 

Average speed cubed, 20,150,000 + 10,700,000 


The validity of these findings is further supported by the consistency 
of indications determined in making a considerable number of other 
analyses of similar nature. 

While the methods of simple, or linear, correlation cannot logically 
be expected alone always to supply all desired information, the writer’s 
studies lead him to believe that aerodynamic engineers could gain 
much contributory information were these and other methods of statis- 
tical analysis to be made at least supplemental to their presently used 
analytical procedure. It may be said that when these methods are ex- 
tended to include partial and multiple correlation techniques some of 
the objections which might legitimately be raised are obviated. For 
example, “lift” is not dependent upon speed alone; drag and other fac- 
tors are influencing factors. Thus, by these extensions of the methods 
of simple correlation the influences of several factors may be considered 
—the influence of one or more may be eliminated, or their joint-effect 
may be weighed. Too, under some instances it is possible that some 
types of data may be analyzed most satisfactorily by employing the 
methods of curvilinear correlation. 











ON MEASURES OF DISPERSION FOR A FINITE 
DISTRIBUTION 


By A.LBert O. HirscHMAN 


could be applied, for descriptive purposes, to various section or sub- 
groups of frequency distributions. In particular, the mean and the 
median often divide distributions into two groups each of which may 
deserve separate investigation. The dichotomy of language often leads 
us to speak of “good” and “bad” students, of “rich” and “poor” coun- 
tries, of “prosperous” and “depressed” industries, and if these ex- 
pressions cannot be defined by reference to objective standards of 
scholastic achievement, of national wealth or of industrial prosperity 
we have to define them by reference to the mean or the median of the 
relevant statistical series. If we then ask questions about the average 
good student or the typically depressed industry, we are speaking in 
terms of the subseries created by the bisection of the entire distribution 
by mean or median. It is the aim of this paper to show some very simple 
algebraic relationships between measures for these subseries and the 
usual measures of dispersions for entire distributions. In the course of 
this inquiry we shall also discover a simple algebraic expression con- 
necting the mean deviation and standard deviation of the entire dis- 
tribution. 


I’ Is OBv10Us that the measures of central tendency and of dispersion 


DEFINITIONS 


Let X:, X2,+ ++, X, be a series ordered by size of which no item 
coincides with either the arithmetic mean M or the median Md.' Let 
the number of elements smaller than the mean be 7, the number of 
elements larger than the mean n—7i=s. If the median does not coincide 
with any one item in the series n is necessarily an even number. Then, 
by convention, the median is defined as the average of the (n/2)th and 
the (n/2+1)st item. On the other hand, when s=1=n/2, the arith- 
metic mean may be anywhere between the two central items. 

We define now 

(a) as “partial means” 
(1) M, and M;, as the means of the items below and above the 
median (Md) 
(2) M; and M, as the means of the items below and above the 
mean (M) 


1 For the contrary case our results would be intrinsically unchanged though algebraically somewhat 
more complicated. 


346 





ot 
of 
le 
1 


l- 


1e 


ie 


at 











¢T 


ver 


-On MEASURES OF DISPERSION FOR A FINITE DISTRIBUTION 347 


(b) as “partial standard deviations” (and their squares as “partial 
variances” ) 
(1) o; and og as the standard deviations of the items below 
and above the median 
(2) o, and o, as the standard deviations of the items below and 
above the mean 
Between the mean and the partial means we can immediately estab- 
lish the following relationships: 


M,+ MM; 
M = —— and hence M—M,= M,—M (1) 


iM; a sM, 
M = ———— _ andhence (+ 38)M 
n 


iM;+sM,, and 


i(M — M,) = s(M, — M). (2) 


EXPRESSION FOR MEAN DEVIATION FROM MEAN AND MEDIAN 


We shall now express the mean deviation from the median and from 
the mean in terms of the “partial” means and the general mean. 
(1) Mean deviation from the median (C ya) 


n/2 n 








>| X — Md| > (Md — X) + > (X - Ma) 
1 1 n/2+1 
Cua = = aces 
n n 
n/2Md — 3X + > X —n/2Md 
ai 1 n/2+1 
n 
so that 
M: — M, 
Cus = ————— (3a) 
2 
and according to (1) 
Cua = M - M, = Mz — M. (3b) 


(2) Mean deviation from the mean (Cy) 


Yix-mMl Yw-x+ De-M 


i+1 





Cu = 


n n 




























AMERICAN STATISTICAL ASSOCIATION: 





iM—->X¥+ >) X-sM 
1 








Cu a t+1 
n 
so that 
(M — M;)+s(M,—M 
Cu = « 5 (4a) 
n 
and according to (2) 
2i 2s | 
Cu = —(M — M,) = — (M, — M). (4b) 
n n 


The formula (3a) reminds us immediately of the quartile deviation, 
which is given by half the distance between the two quartiles Q; and 
Q:. The quartiles can be interpreted in our terminology as “partial 
medians” in the same sense as we have called M;, Mz and M,, M, par- 
tial means. The mean deviation from the median and, to a lesser ex- 
tent, the mean deviation from the mean—note (3b) and (4b)—are 
thus seen to be refined “ranges”; i.e., distances between certain points 
lying on the abscissa of a statistical distribution. It is sometimes said 
in statistical textbooks that an advantage of the quartile deviation over 
the other measures of dispersion is its simple meaning.? This can hardly 
be maintained with respect to the mean deviations from mean and 
median, which are shown to be essentially measures of the distances 
between the general and the partial means. In addition the expressions 
(3b) and (4b) may be of practical use in the computation of these meas- 
ures of dispersion especially when the data are ungrouped. Expressions 
(4a) and (4b) of course reduce to (3a) and (3b) when i=s=n/2, 


EXPRESSION FOR SKEWNESS 


The relationships which we have established also yield a simple 
measure of skewness. Analogous to the common measure 
(Q; — Md) — (Md — Q,) 
(Qs — Md) + (Md — Q,) 
relating to the differences between median and quartiles we can indeed 
form the differences between the median and the partial means M, and 


M; and compute the ratio of the difference and of the sum of these dif- 
ferences. This gives 





2 See Yule and Kendall, Introduction to the Theory of Statistics, London, 1937 (\1th edition) p. 149. 





La) 


ns 


»d 
id 
f- 








ee 


PN 





-On MEASURES OF DISPERSION FOR A Finite DIsTRIBUTION 
(M2 — Md) — (Md — M;) 
(M, — Md) + (Md — M,) 
M, + M; — 2Md 
~ i= Ai 


Sk = 








which becomes, according to (1) and (3a), 
M — Md 
Ca 


Sk 


(5) 


This expression has, as just shown, a meaning analogous to that relating 
to the quartiles and, like it, varies between 0 and +1. On the other 
3(M — Mad) . 


hand, it resembles the often used measure in that it de- 
Co 





pends on the position of the mean and therefore on every item in the 
distribution. 
RELATION BETWEEN THE STANDARD AND MEAN DEVIATIONS 


In order to show the relationship between the standard deviation 
and the “partial” quantities defined above we write, according to a 
well-known property, 


n/2 














 (X — M)? 
2 = — — (M — M,)? 
o1 n/2 ( 1) 
Dd (X — M)? 
oo — (M; — M)?. 
oe n/2 (M; ) 
Adding and dividing by 2 
n/2 n 
> (X — M)?+ >> (X — M)? 
a1? + o2? om n/2+1 
2 n 


(M — M;)? + (MM: — M)* 
2 





















350 AMERICAN STATISTICAL ASSOCIATION: 


According to (3b) and to the definition of oc, this can be written 

















a1? + o-? 
A field ts Gees (6) 
2 
Similarly we derive from 
D(X — M)? 
oi? = —— — (M — M,)? 
1 
dL (X — M)? 
o? = — — (M, — M)? 
8 
the following alternative expression for the variance 
i(M — M,;)?+38(M,— M)? _ to? + 80,? 
gi = +4 a 
n n 
n 4:°(M — M,;,)? n 4s?7(M, — M)? 4 io," + so,” 
AG n? 4s n? n 
According to (4b) this gives 
n? 1 2 oe 8 x 
o? = Cy?-— + — : : ; (7) 


4is n 


As before, the expression involving the “partial” quantities relating 
to the median is simpler than that involving the “partial” quantities 
relating to the mean, and the latter reduces to the former when 
i=s=n/2. Both expressions (6) and (7) show in a novel form the char- 
acteristic difference in behavior of mean deviation and standard devia- 
tion for different distributions. 

The mean deviations from both mean and median depend only upon 
the distances of the partial means from the general mean. (Quantities 
which, by analogy to the partial standard deviations, could be defined 
as partial mean deviations, do not enter into the expressions for Ca 
and Cy.) The standard deviation, on the contrary, is a function of both 
these differences and of the dispersion of the elements around the par- 
tial means.? It is at once evident from these expressions that the stand- 

* The coefficient n*/4;, of Cy in (7) shows that, in addition, the standard deviation increases when, 
with mean deviation from the mean and partial standard deviations constant, the skewness of the dis- 


tribution increases. In the expression (6) this element is taken care of directly by the quantities o: and 
o:, which can simultaneously become 0 only if i =s, whereas this is not the case for oj and @y. 





— 





Si 


BA 8 





Oe 


——— ee 








-On MEASURES OF DISPERSION FOR A FINITE DISTRIBUTION 351 


ard deviation can be equal to the mean deviation only in the exceptional 
case of a symmetrical and ideally U-shaped distribution, in which 7=s 
and in which the partial variances are zero. 

The knowledge of this difference in behavior may help in choosing 
between the rival measures of dispersion in statistical work of a purely 
descriptive character (i.e., not concerned with sampling problems). Thus 
in educational statistics, if we speak about dispersion of the marks 
of a group of students, we may have in mind merely the distance be- 
tween the average good and the average bad student, or we may want 
to take account also of the dispersion within good and within bad stu- 
dents. Only in the latter case should the standard deviation be used, 
whereas in the former case one would have to use the quartile devia- 
tion, mean deviation from median or mean deviation from mean, ac- 
cording to the type of average selected. 

It is also seen that the knowledge of mean and standard deviation of 
a distribution tells us something about the partial standard deviations. 
By writing expressions (6) and (7) in the following form 


a 2 + o 2 
e@-Cuf = mace (8a) 
n? a ? + § x 
ja Ghee ill, cos (Sb) 
4is n 


we obtain an average of the two partial variances weighted by the fre- 
quencies to which each variance relates (since these frequencies are by 
definition always n/2 for both o; and oe, a simple average is justified in 
(8a)). 

In the case of a typically symmetrical distribution, not only do the 
two expressions coincide but we should have in addition o;=¢2, so that 
the difference of the squares of standard deviation and of mean devia- 
tion would yield directly the partial variance. 


RELATION TO GEARY’S TEST OF NORMALITY 


As was stated in the beginning, this note is limited to the field of 
descriptive statistics. In one respect, however, the result reached may 
be of interest to sampling theory. It is known that for the normal 
curve we have 


Mean deviation from mean 3 
= 4/— = 0.70788 -- ; 
T 





Standard deviation 





















352 AMERICAN STATISTICAL ASSOCIATION: 


Now this ratio, which may be denoted by the letter a, has been pro- 
posed as a test of normality by R. C. Geary, who has derived its ap- 
proximate sampling distribution and calculated tables of the mean 
value, the standard error, and 10, 5 and 1 per cent probability levels of 
a for samples from a normal population.‘ This test was meant to supple- 





— 3 . : m™, 
ment the usual tests Wb,= : and, in particular, b.=-——» where 


Mz?! mM? 


Me, ms and m, denote the moments taken from the mean. 

These two criteria are easily connected with the shape of the fre- 
quency distribution, \/b; standing for symmetry and bz for kurtosis. 
The relationship which we have established makes it possible to inter- 
pret the a-test in a similarly concrete way. Supposing a symmetrical 
distribution ({=s=n/2), and dividing by o, expression (8b) becomes 


Cx? 1 o;? a,” 
nent HS+3) 


o? 2 \ o? o? 











and if symmetry is such that o;=¢,, this reduces to 





For symmetrical distributions the ratio of mean deviation to stand- 
ard deviation is thus negatively related to the partial standard devia- 
tions, taken in units of ¢. A decrease in a therefore means an increase of 
o;/o or o,/c, and for unimodal symmetrical distributions this is possible 
only by a lengthening of the tails of the distribution compensated by 
other elements drawing closer to the center so as to keep ¢ constant, i.e., 
by a development of the distribution in the leptokurtic direction. We 
are thus able to understand intuitively the high negative correlation 
between the be criterion standing for kurtosis and Geary’s a which 
E. 8. Pearson and Geary have established experimentally and theoreti- 
cally. 


«R. C. Geary, “The Ratio of the Mean Deviation to the Standard Deviation as a Test of Nor- 
mality,” Biometrika, Vol. 27 (1935), pp. 311-332; E. S. Pearson, “A Comparison of b: and Mr. Geary’s 
wy, Criterion,” ibid., pp. 333-352; R. C. Geary, “Note on the Correlation between 6b; and wn,” ibid., 
pp. 353-355; R. C. Geary, “Moments of the Ratio of the Mean Deviation to the Standard Deviation 
for Normal Samples,” ibid., Vol. 28 (1936), pp. 295-305; R. C. Geary and E. 8S. Pearson, Tests of Nor- 
mality, Biometrika Office, London, 1938. 





VIR, 


ONT 





TEES ne SP _ Tes 


Ty ON ae 


eatrt, cena shied 


Pe. 








ON SOME CENSUS AIDS TO SAMPLING 


By Morris H. Hansen anp W. Epwarps DEMING 
Bureau of the Census 


SOME OF THE AIMS IN SAMPLING DEVELOPMENT 


AMPLING speeds results, reduces costs and demands on manpower, 
~ and reduces the burden of reporting. These are factors of prime 
importance in times of stress, like the present, when situations must 
often be evaluated and acted upon within a brief space of time. Sam- 
pling is playing an indispensable role by providing information for many 
and various administrative and planning programs under pressure and 
requirements of accuracy that a few years ago could not have been met. 
The gains in sampling practice that have proved to be so useful in war 
time will be in equal demand in time of peace and readjustment. 

The purpose of any survey, sample or complete, is to provide factual 
evidence that presumably will be useful in formulating a course of ac- 
tion in the solution of some problem. There is thus a certain degree of 
accuracy required in an estimate that is to be obtained from a sample. 
Consequently, the aim desired in designing a sample is to devise pro- 
cedures that will 


i. Operate within the available budget and limitations of time and 
manpower; 
ii. Operate also within other imposed administrative limitations or 
restrictions; yet 
iii. Produce the maximum amount of information that is possible 
within the limitations i and ii. 

To meet this requirement, the plans for the survey will call for 
the use of any available equipment, personnel (organized field or 
office staff), and other facilities (maps, available statistical data, 
existing knowledge of the universe, and the like). Even though 
this requirement be met, the sampling plan is still not satisfac- 
tory unless evidence can be produced in the form of mathematics 
coupled with experience to show that it will 

iv. Give results that are reasonably sure to fall within a certain allow- 
able sampling error, which for any survey will be predetermined 
by the administrative uses that are to be made of the data. 


A sample design is judged by the way in which it measures up to these 
requirements. One design may be better than another in one require. 


353 














354 AMERICAN STATISTICAL ASSOCIATION: 


ment, and worse in another. A sample design is improved by any device 
that produces an advantage in meeting one or more of these require- 
ments. 

With the aim of improving the efficiency of its surveys, the Census 
has been conducting a sampling research and planning program for 
compiling and investigating the distributions of various characteristics 
of universes to be sampled, evaluating the effectiveness of alternative 
sampling procedures and designs, developing new theory and adapting 
existing theory to meet the needs of new requirements, and measuring 
experimentally not only the sampling variances encountered, but also 
the time and cost factors involved in alternative methods of collecting 
data. This research and planning program is supplemented by the 
preparation of sampling aids in the form of summary cards for small 
areas, some of which are described in the second half of this paper. 

On the basis of recent advances in theory and practice in the Census 
and elsewhere, and on the basis of accumulated experience in conduct- 
ing censuses and surveys of various kinds, gains are being made in the 
specification of sample designs that provide the most information pos- 
sible under budgetary and other administrative limitations that are 
always imposed, yet utilizing existing facilities to the fullest degree. It 
is now possible to lay out satisfactory sampling designs to provide 
factual data in the solution of many kinds of administrative problems. 
Further improvement in such designs will be made concurrently with 


i. Improved availability of statistical information already on 
hand. This may take the form of 
a. Punched card summaries of statistics for small areas, for use 
in stratification and sample selection. An example is the E.D. 
Summary Card (vide infra) prepared from the Census of 1940. 
b. Analytic summaries of the distributions, variances and co- 
variances of certain characteristics of the universe, computed 
for different sizes and kinds of sampling units, and for diverse 
characteristics of the universe. Such information is needed in 
deciding the size and kind of sample to be taken. 

ii. Additional theory (particularly concerning systematic sampling 
designs, type and method of selection of sampling units, errors of 
response, stratification, etc.). 

iii. Additional experience and records regarding field work, interview 
and travel costs; reactions of the respondents, and the effects of 
these reactions on the inferences to be drawn from the data col- 
lected; comparisons between what was actually done in the field, 














cr Ww 


\ew 








-On Some Census Arps To SAMPLING 355 


and what was intended to be done; additional experience in 
wording questionnaires and training interviewers. 


SOME RECENT ADDITIONS TO SAMPLING AIDS 


Although the stress of war conditions has produced severe disruptions 
in population and occupational distributions, data from the Census of 
1940 provide a mass of detail that is invaluable in the design of many 
sampling projects that deal with population or agricultural items. More 
recent figures on the distribution of certain population characteristics 
would be desirable, and the estimates of net migration furnished by 
registrations and rationing programs help to provide these. However, 
data from the 1940 Census, particularly with regard to the character- 
istics of small areas as distinguished from characteristics of individuals 
in the population are extremely useful and effective in designing current 
sampling projects. In view of the usefulness of the 1940 Census data, we 
here describe three sets of summary cards for small areas that are valu- 
able as sampling aids. Special tabulations of these cards can be provided 
at cost. 

The E. D. Summary Card.—The E. D. Summary Card! is an example 
of information derived from the 1940 Census, recently made available 
on punched cards. Through this card, data for small areas are accessible 
for purposes of stratification and sample selection, and for certain 
summaries. The card was devised for the purpose of producing useful 
controls for sampling in various kinds of surveys—housing, population, 
agriculture, labor force characteristics, marketing, etc. An E.D. Sum- 
mary Card has been punched for every E.D. outside the “block cities.” 
The characteristics punched on the E.D. Summary Card are enumer- 
ated below.? 


1 The abbreviation E.D. denotes enumeration district. An E.D. is one of the small administrative 
areas into which the entire country is divided for the purpose of taking a census. Every point of the 
country lies in one but only one E.D. An E.D. is delineated with the intention that it shall contain a 
population no greater than one enumerator can cover within the enumeration period. It is further re- 
quired, however, that no E.D. shall cross the boundary of any incorporated place, city ward, census 
tract (in a tracted city), township, or other minor civil division, no matter how small. The usual range in 
the population of an E.D. is between 500and 1500, though because of the second of the two requirements 
just noted, many E.Ds. contain populations much smaller than 500—some even as low as one or two 
persons, or none at all. A small E.D. is usually assigned to an enumerator who is to cover another one 
nearby, with the consequence that there are more E.Ds. than enumerators. As a matter of fact, in the 
1940 Census there were 154,000 E.Ds., but only 110,000 enumerators. 

2 The starred items (*) are ratios that have been calculated for the entire minor civil division of which 
the particular E.D. forms a part. Every E.D. in a minor civil division shows the same minor civil division 
ratios. A minor civil division is the smallest political area recognized in the census. Incorporated places 
are separate minor civil divisions, and outside of incorporated places, townships, election precincts, 
beats, etc. are used, the nature of the M.C.D. varying from onestate to another. Each minor civildivision 
contains one or more E.Ds., but in many instances the M.C.D. and the E.D. are coextensive. 






























356 AMERICAN STATISTICAL ASSOCIATION: 


State and county 

Enumeration district 

Minor civil division 

Metropolitan district 

Cropland density in the minor 
civil division* 

Population density in the minor 
civil division* 

Urban-rural code 

Total population 

Native white male population 

Per cent native white male 

Negro male population 

Per cent Negro male 

Farm population 


(For explanation of * see footnote 2.) 


Per cent farm 

Total number of dwelling units 

Number of owner-occupied dwelling 
units 

Per cent owner-occupied 

Average rent per dwelling unit (based 
on rent paid in rented dwelling units, 
and estimated rental value of owner- 
occupied dwelling units) 

Average farm acreage code for the 
minor civil division* 

Average farm value code for the minor 
civil division* 

Total number of farms 


The Block Summary Card.—It was just stated that the E.D. Sum- 
mary Card was punched for E.Ds. outside block cities. It was not 


necessary to punch the E.D. card 
existed already the Block Summary 


within block cities, because there 
Card for each block in these cities. 


The block is ordinarily a much smaller unit than the E.D. and hence 


better for most sampling purposes. 


The block cards correspond to the 


statistics published in the Supplement to the First Series of Housing. 


The characteristics punched on the 


ated below. 
City 
Ward 
Enumeration district 
Block 
Number of structures 
Total dwelling units 
Owner-occupied dwelling uni 


Block Summary Card are enumer- 


ts 


Tenant-occupied dwelling units 


Vacant dwelling units for sal 
Vacant dwelling units, other 
Dwelling units reporting yea 
Dwelling units built 1930-19 
Dwelling units built 1920-19 
Dwelling units built 1900-19 
Dwelling units built 1899 or 
Occupied dwelling units 


e or rent 


r built 
40 
29 
19 
before 


Dwelling units occupied by nonwhite households 
Dwelling units reporting persons per room 


*A “block city” is a city for which statistics by blocks have been issued. This has been done in 
the Supplement to the First Series Housing Bulletins (16th Census: 1940) for the 191 cities that had 


50,000 or more inhabitants in 1930. 





- - ee 





=" ee s 








-On Some CEnsus Alps To SAMPLING 357 


Dwelling units with 1.51 or more persons per room 

Dwelling units reporting on plumbing equipment and repairs 
Dwelling units needing repair or with no private bath 
Dwelling units needing repair 

Dwelling units with no private bath 

Dwelling units reporting on mortgage 

Dwelling units mortgaged 

Dwelling units reporting rent 

Total monthly rent 

Average monthly rent per dwelling unit 


The County Summary Cards.—A number of cards have been pre- 
pared to bring together significant summary information by counties. 
Additional county information is assembled from time to time on new 
cards as required in current sampling studies. Most of the data on the 
cards are already available in published form, but the sources are 
scattered. The county cards provide data immediately accessible for 
sampling use. 

In addition to the summary cards for counties, a set of cards has 
been prepared to provide primary sampling units consisting of one, 
two, or three contiguous counties. The 3067 counties of the country 
have been consolidated into approximately 2000 primary sampling 
units. In forming a primary sampling unit, contiguous counties are 
grouped together so as to give a unit that is as internally heterogeneous 
as possible. Such consolidations have been introduced for the purpose of 
providing sampling units more efficient than single counties, and are 
intended for use when the number of administrative centers is limited 
and the sampling must be confined to a field of operation around each 
administrative center. As a matter of fact, the Census made these con- 
solidations in the process of revising the sample used for the Monthly 
Report on the Labor Force and for many special surveys. Approxi- 
mately 60 administrative centers are used for this sample. 








THE SOCIAL INSURANCE MOVEMENT 


By R. CiypE Wuitr* 
University of Chicago 


OCIAL INSURANCE is concerned with some of the risks which are 

familiar in the field of private casualty insurance. One type of casu- 
alty insurance provides indemnity for losses or damages arising out of 
the legal liability of the insured for loss or damage to the person or 
pioperty of another. The second type provides benefits in the case of 
loss or damage to the person or property of the insured. Most social insur- 
ance belongs to the second type, but workmen’s compensation should 
probably be classified under the first because the employer is legally 
liable to pay compensation to his injured employee. Hence, social in- 
surance may be regarded as public casualty insurance, authorized by 
law and usually administered in whole or in part by government and 
designed to maintain some portion of the income of the insured. A 
social insurance law defines a risk, prescribes a formula for the deter- 
mination of the amount of cash or other benefits and provides for the 
collection of contributions to pay the costs of benefits. A contribution 
tax is generally levied on the employer or on the employer and the em- 
ployee. Frequently the law requires an appropriation out of general 
governmental revenues as a subsidy to the system. Coverage is always 
defined in reference to an employed person, sometimes including self- 
employed persons, and benefits are frequently payable to the de- 
pendents of insured persons under specified conditions. 

In large measure the enactment of social insurance laws in a nation 
has followed the growth of large scale industrial and commercial enter- 
prise. Historically the roots of social insurance reach back many cen- 
turies, but these early experiments were conducted by small mutual aid 
groups. About the end of the eighteenth century some municipal gov- 
ernments in Europe set up very limited forms of social insurance, but 
the laws were permissive and usually not very successful. In 1786, John 
Acland proposed a general plan of social insurance in England as a 
means of relieving the taxpayer of his relief burdens. The project was 
discussed in Parliament, but no legislative action was taken. Tom 
Paine advocated something of the sort in this country. It was not until 
1883, however, that a national legislative body adopted a social insur- 
ance law which was applicable by compulsion to large numbers of work- 
ers. This was the famous sickness insurance law sponsored by Bismarck 

* The writer wishes to acknowledge the painstaking work of Miss Eloise Whitney and Mrs. Lois 


Sentman, two of his graduate students in the University of Chicago. They spent many hours examining 
the sources and tabulating the material of Table I in this article. 


358 





= 5 


— -_ wee ' 








-Tue Soctat INsuRANCE MovEeMENT 359 


in Germany. Agricultural and domestic service workers are frequently 
covered by social insurance laws, but in general the concern of the na- 
tional legislative bodies has been with industrial and commercial em- 
ployees. 

Social insurance may be voluntary or compulsory. If it is voluntary, 
it offers certain benefits to the contributor or to the potential benefi- 
ciary which he may take or leave. A law which prescribes a compulsory 
scheme defines the employments covered and names the beneficiaries by 
industrial or occupational class. Those involved have no choice. Under 
most compulsory laws there are likely to be certain classes of persons 
who may elect coverage, but they represent a small percentage of the 
population insured. Voluntary social insurance was for several decades 
characteristic of Belgium, France, The Netherlands, Scandinavia and 
Switzerland, but during the last fifteen years these countries have been 
changing to compulsory coverage. Compulsory coverage gives greater 
assurance that all persons subject to a risk will be continuously covered 
and, therefore, protected, and experience has shown that benefits are 
likely to be more nearly adequate to assure minimum subsistence under 
this type of law. 

The aim of this study has been to chart the course of social insurance 
legislation and to determine what laws are now in operation in all parts 
of the world. In order to simplify this task the phrase, “social insur- 
ance law,” has been rather narrowly defined. As used in this paper, a 
social insurance law is one which was enacted by a national legisla- 
tive body for the protection of a class or classes of workers which may 
include government employees—such as civil service workers, teachers, 
policemen, etc.—but must include others. Pension legislation for war 
veterans and their dependents and war risk insurance which many gov- 
ernments have had and still have in force have been excluded, because 
they relate to the special condition of war and not to the ordinary 
peace-time risks. A departure has to be made from the strict concept of 
national legislation in the case of the United States, because the courts 
have held that the states have the power to enact protective legislation 
such as social insurance. For example, the state of Washington adopted 
the first general workmen’s compensation act in this country in 1911 
(the Federal Government enacted a law in 1908 but it covered only 
federal employees) and Wisconsin enacted the first unemployment com- 
pensation law in 1932. No attempt is made in this paper to show 
changes in the original law, unless it was entirely repealed, which has 
occurred only once, and that was in the case of the Russian unemploy- 
ment insurance law. 

Obviously public assistance, as it is known in English speaking 





360 AMERICAN STATISTICAL ASSOCIATION: 


TABLE I 


SOCIAL INSURANCE LAWS BY COUNTRY OR DEPENDENCY, RISK 
COVERED AND DATE OF ORIGINAL ENACTMENT* 








Risk covered and date of original enactment 





Country or 





dependency Accident Sickness Invalidity — Old age Survivors Maternity 
Africa 
Italian Colonies 1931t —t —_ _— — _— —_— 
Mozambique 1931f — —_ — — —_ — 
Rhodesia 1931f¢ — —- _- -= —- -- 
South Africa 1931t == a -- -- a — 
West Africa 1940 — — — — — —_ 
Asia 
British India 1923 — ~- —_ — — 1941§ 
China 1929 — — _— — _— _ 
Iraq 1937 — a — — — —_ 
Japan 1911 1922 —_— —_— _— —_— _ 
Palestine 1927 —_— — —_ —_ —_ —_— 
Philippines 1927 — — —_ _ _ —_ 
Australasia 
Australia 1900 1938 1908 _ — 1938 1938 
New Zealand 1908 1938 1938 1938 1938 1938 1938 
Europe 
Austria 1887 1888 1925 1920 1925 1925 1888 
Belgium 1903 1903 1924 1933 1924 1937 — 
Bulgaria 1908 1924 1937 1925 1924 1924 1924 
Czechoslovakia 1888 1888 1924 1921 1924 1924 1888 
Denmark 1898 1898 — 19077 _- — 1937 
Esthonia 1923 1911 _— _— — — 1911 
Finland 1895 — 1937 1917 1939 _— — 
France 1898 1928 1928 1905 1938 1928 1928 
Germany 1884 1883 1889 1927 1889 1911 1883 
Great Britain 1897 1911 1911 1911 1925 1925 1911 
Greece 1901 1934 1932 —_— 1932 1932 1934 
Hungary 1907 1927 1928 _ 1928 1928 1927 
Irish Free State 1906 1911 1911 1911 — 1935 1911 
Italy 1898 1928 1923 1919 1923 1923 1935** 
Latvia 1927 1911 a — — —— 1911 
Lithuania 1938 1925 — — -- — 1925 
Luxemburg 1902 1901 1911 _ 1911 1911 1901 
Netherlands 1901 1913 1913 1916 1913 1913 1913 
Norway 1894 1909 — 19069 — —_ 1909 
Poland 1920 1920 1927 1924 1927 1927 1920 
Portugal 1913 1919 1919 — 1919 —- 1919 
Roumania 1912 1912 1912 —_ 1912 _ 1912 
Russia 1903 1922 1922 _ 1922 1922 1922 
Spain 1900 _— —_— 1919 1939 —_— — 
Sweden 1901 1935 1913 1934 1913 oo 1935 
Switzerland 1912 1911 — 1924 —- oo 1911 
Yugoslavia 1922 1922 1922 — 1909 1922 1922 
Islands, Pacific 
Fiji Islands 1940 —_ _— _— _— _— — 
Hawaii 1915tt — —_— 1937tt _— — _ 
New Caledonia 1931tt — _ — — — —_ 
North America 
Canada 1902 §§ _— _ 1940 — —_ — 
Costa Rica 1925 1941 _ 1941 1941 1941 1941 


Cuba 1916 — 1940 1940 1940 1940 1939 























- THE Socrat INSURANCE MovEMENT 361 


TABLE I (Continued) 








Risk covered and date of original enactment 








Country or 
dependency Accident Sickness Invalidity — Old age Survivors Maternity 
Guatemala 1924 — a -— — _— 1926 
Mexico 1906 —— -- —— os — — 
Newfoundland 1908 —- — — — — — 
Nicaragua 1930 —_ — — —- -~ “= 
Panama 1942 1942 1942 —_— 1942 + 1942 
Puerto Rico 191699 —_ a= — — —_— — 
United States*** 1911 —_— 1935 1932 1935 1939 —_— 
South America 
Argentina 1923 — — — 1923 -- 1936 
Bolivia 1924 1939 — _— 1939 — — 
Brazil 1919 1940 1940 —_ 1940 1940 1940 
Chile 1924 1924 1924 _— 1924 _ 1924 
Columbia 1915 1939 — 1939 -- _ 1939 
Dominican Rep 1932 — — == — -— — 
Ecuador 1928 1935 1935 — 1935 1937 —_— 
Paraguay 1927 _ _ — 1941 — —_ 
Peru 1911 1941 1941 _— 1941 1941 1941 
Uruguay 1914 — 1919 1939 1919 _- = 
Venezuela 1928 1940 1940 1940 1940 -—- 1940 





* Page citations and dates of publications are not indicated because of the large number of such 
references, but the data for this table were obtained from the following sources: Legislative Series, 
International Labour Organization, Geneva and Montreal, 1923 to the present; International Labour 
Review, International Labour Organization, Geneva and Montreal; Monthly Labor Review, U. 8. De- 
partment of Labor; and Barbara N. Armstrong, Insuring the Essentials, The Macmillan Company, 
New York, 1932. 

+ Probably some minor legislation earlier. 

t Indicates absence of this kind of coverage. 

§ Covers only women working in mines. 

{ Ghent system only. Belgium and Denmark adopted the Ghent system at early dates also but 
changed to the more common system on the dates given. 

** Includes benefits at marriage and birth. 

tt These were territorial laws and were the first such laws for this dependency. 

tt Law adopted some time before this date, but exact date could not be established. 

§§ First provincial workmen's compensation act. 

4 First territorial law. 

*** The federal law of 1908 covered only federal employees for workmen's compensation. The 
date given is the Washington State Act which was the first workmen's compensation law to cover 
employees other than governmental. In 1942 Rhode Island redefined unemployment to include unem- 
ployment due to illness and now pays compensation in such cases, but this is not defined here as sick- 
ness insurance. The invalidity insurance covers only railroad employees under very restricted conditions. 
The Wisconsin Unemployment Compensation Act is shown as the first United States unemployment 
insurance law. 


countries, has been excluded from consideration here. It is not in any 
ordinary sense insurance. In public assistance the risk is only vaguely 
defined ; the liability of a public fund for public assistance payments is 
not established, because officials have wide discretion ; funds for the pay- 
ment of public assistance are appropriated out of general revenues 
and vary according to political attitudes from one legislative session 
to another; and the potential recipient must submit to a means test. By 




















































362 AMERICAN STATISTICAL ASSOCIATION: 


contrast for the purposes of a social insurance law the risk is carefully 
defined, the liability of the insurance fund is stated in terms of a 
benefit formula, funds are derived from earmarked taxes or in other 
predetermined ways, and the beneficiary does not submit to a means 
test. 

The clearly defined risks which were found in this study are accident 
(including occupational disease), invalidity of long duration, maternity, 
old age, sickness, survivors (i.e., generally only widowhood and orphan- 
age) and unemployment. At the beginning of the present war all of 
these risks were covered in Austria, Bulgaria, Czechoslovakia, France, 
Germany, Great Britain, the Netherlands, Poland and New Zealand. 
Table I shows the years in which social insurance laws in force at the 
end of November, 1942, were first adopted by national legislative 
bodies. Exceptions in the case of the United States will be indicated. 
A “national social insurance law” may apply to certain classes of the 
entire population, or it may permit a political subdivision to enact 
social insurance legislation and cover certain defined classes. For exam- 
ple, Titles III and IX of the United States Social Security Act did not 
create an unemployment insurance system in this country, but they 
carried incentives to the states to adopt such measures. Obviously that 
was national legislation in the social insurance field, but the date of the 
Wisconsin Act which preceded the Social Security Act is shown as the 
first American unemployment insurance law. Dates for laws applying 
to certain dependencies, such as Hawaii and Rhodesia, are given sepa- 
rately just to indicate the extension of social insurance measures to 
outlying parts of the world. 

Several observations may be made concerning Table I. First, all of 
the countries and dependencies have accident insurance (workmen’s 
compensation in English-speaking countries) but no other form of social 
insurance is found in all of them. Second, of the 253 laws counted and 
listed 149 are found in European countries. That is 58.9 per cent of the 
total. This is perhaps not unexpected, since social insurance was a 
European invention. Third, only 18 of the 52 laws adopted between 
1930 and 1939 were in Europe. Fourth, from 1940 to the end of 1942 
new laws numbering 34 were adopted, and all of them were outside of 
Europe. Within 12 years the non-European world adopted 68 new social 
insurance laws. It appears to be clear that this European invention has 
now been accepted in principle and to a large degree in fact as a useful 
social device in most of the world. Many population groups in the 
countries which have various types of social insurance are not covered, 
but historically the tendency has been to broaden coverage once a type 
of social insurance was adopted, and it is reasonable to expect that ex- 














-Tue SocraLt INSURANCE MOVEMENT 363 


tension of coverage will continue. At present only two countries ap- 
proach one hundred per cent coverage of all population groups, and 
they are Russia and New Zealand. The only large populated areas of 
the world which have not yet been affected by the social insurance 
movement are some of the large Pacific islands, southeast Asia, south- 
west Asia, Turkey and the central mass of Africa. These areas probably 
do not contain more than 10 per cent of the population of the world, and 
because of their lack of industrialization social insurance is not well 
adapted to the economy of these areas. 


TABLE II 


NUMBER OF NATIONS HAVING SOCIAL INSURANCE 
LAWS BY RISK COVERED 











Risk Number of nations 
Accident 64 
Invalidity 31 
Maternity 37 
Old age 34 
Sickness 37 
Survivors 24 
Unemployment 26 





The prevalence of various forms of social insurance is shown in 
Table II. The risk of industrial accident has been most widely recog- 
nized as a hazard which should be covered by insurance, and many 
countries have only this type of social insurance. Sickness and mater- 
nity insurance stand next in frequency of adoption. 

The growth of social insurance in time is shown in Table III which 
shows by quinquennia the dates of origin of particular kinds of social 
insurance in the various countries. There was a steady increase in the 
number of new laws adopted in the several countries up to the outbreak 
of the first world war, but during the period of the war new enactments 
dropped sharply. The quinquennium immediately following the war 
was marked by the largest number of new laws of any similar period. 
The early years of the depression showed a drop in new legislation, but, 
when recovery began in the middle thirties, the movement regained its 
momentum. 

Nothing in the foregoing tables indicates the extent of coverage in 
the different countries of a specific form of social insurance, and nothing 
is said about the adequacy of the benefit provisions. There is great 
diversity with respect to both groups covered and the adequacy of 
benefits. The aim of this study was to determine the countries which 
had accepted the principle of social insurance as a method of achieving 








364 AMERICAN STATISTICAL ASSOCIATION: 


TABLE III 


NUMBER OF SOCIAL INSURANCE LAWS ADOPTED, 
BY QUINQUENNIA 














Quinquennium Number of laws 

Total 253 
1880-84 3 
1885-89 9 
1890-94 1 
1895-99 5 
1900-04 12 
1905-09 13 
1910-14 35 
1915-19 16 
1920-24 41 
1925-29 31 
1930-34 15 
1935-39 38 
1940-42 34 





social security and the kinds of risk which had been recognized as in- 
surable. 

The widespread attention, which the recent report on the British 
social services by Sir William Beveridge received, is undoubtedly due 
in large part to the general understanding of the insurance principle in 
all countries and to the common recognition that this principle is ap- 
plicable to other risks and to other groups in the population than those 
currently covered. The fact that this report concerns itself almost en- 
tirely with social insurance and is uncomplicated by more elaborate 
proposals for reorganizing the national economy may account for the 
cordiality of its reception in this country. This contrasts sharply with 
that accorded the recent report of our National Resources Planning 
Board, dealing with the same problems. The report of the Planning 
Board lacks the simplicity and clarity of the Beveridge report, and this 
lack was undoubtedly due in large measure to the complex program 
which the Planning Board outlined. The Beveridge report has given 
new impetus to the social insurance movement. 

















HORACE SECRIST, 1881-1943 


HE SCIENCES of statistics and economics lost a scholar of high rank 
Tin the death of Professor Horace Secrist. As a research student, as 
an author, and asa teacher, he always maintained a rigorous scientific 
attitude toward his work. His main research activities lay in the realm 
of statistics, but his interests were not confined to technique, as he 
was always seeking for the hidden meaning, or relations, that lay back 
of the factual data. He firmly believed that truth must be discovered 
from an analysis of factual data and he had little patience with “arm- 
chair” economics. If a proposition did not lend itself to statistical veri- 
fication, he had little use for it. 

Secrist was widely read in the literature of his fields of interest. He 
was acquainted with the writings of the leading scholars both at home 
and abroad, and he was constantly searching the literature of scientific 
research for proven methods of establishing the truth. Regardless of 
the character of the statistical problem that was occupying his atten- 
tion, whether it be some governmental question that required an im- 
mediate answer, or a problem that was perplexing a group of business 
men, he never lost sight of his scientific interest in the data. Often after 
the immediate problem had been disposed of, an article would appear 
from his pen which disclosed that the data had been subjected to fur- 
ther analysis for the purpose of discovering what contribution they 
might make to our understanding of economic relations. 

Secrist was always a rigorous teacher. He had set high standards of 
attainment for himself and he expected similar ones from his students. 
Many a student who complained, while in college, of Secrist as a hard 
task-master, soon‘learned upon entering a business career that the dis- 
cipline he had received in Secrist’s classes was his best asset. Graduate 
students always benefited greatly by work with him. He opened their 
eyes to the rigors of scientific method and developed within them a 
wholesome respect for scientific caution. 

He was author of a large number of books and monographs dealing 
with statistical problems, but these problems were always treated with 
an eye for their social and economic implications. In addition to his 
own scholarly contributions, he was active for many years in the work 
of the Social Science Research Council, having served as its secretary 
for several years. He frequently turned his statistical knowledge and 
skill to the solution of some governmental problem. In my judgment, 
he developed the most effective method of determining “bulk-line cost” 
as a means of adjusting price to output in war times. If this method had 
not been kicked out of governmental councils in our present war emer- 


365 
























AMERICAN STATISTICAL ASSOCIATION: 





366 


gency, we would now be hearing less about shortages, “roll-backs” and 
subsidies, 

For years Secrist suffered from a physical disability that finally 
caused his death. Nevertheless during all this time he maintained his 
scholarly activity and produced some of his best work. Throughout his 
career he lived the life of a true scientist and is an example to be 
emulated by others who wish to contribute to a better understanding 


of our economic and social life. 
F. 8. DEIBLER 


COMMITTEE ON NOMINATIONS 


President Goldenweiser has appointed the following members as the 
Committee on Nominations for 1943: F. Leslie Hayford, formerly Econ- 
omist of the General Motors Corporation, New York City, Chairman; 
Professor F. L. Carmichael, School of Commerce and Bureau of Busi- 
ness and Social Research, University of Denver, Denver, Colorado; and 
Professor Douglas E. Scates, Duke University, College Station, Dur- 
ham, North Carolina. The report of the Committee on Nominations 
will be published in the November BULLETIN. 








is 
be 
ng 





BOOK REVIEWS 


GLENN E. McLauGHLuINn 
Review Editor 


War without Inflation, by George Katona. New York: Columbia University 
Press. 1942. x, 213 pp. $2.50. 


A more informative title for this book might have been “The Art of Anti- 
Inflationary Propaganda.” For its main thesis is that with the economic 
weapons available for fighting inflation, certain psychological conditions must 
be fulfilled if the fight is to be successful. To be more explicit: The exper- 
ience of every belligerent country has shown that it is practically impossible 
to close the “gap” by fiscal measures alone. Therefore, it becomes necessary 
that if inflation is to be avoided there must be a substantial increase in the 
rate of voluntary saving. For, if the public were determined to spend its pre- 
war proportion of its disposable income, direct controls would be powerless 
to prevent inflation. For an adequate rate of voluntary saving to exist, it is 
necessary that the public should have confidence in the Government’s anti- 
inflationary program. To the extent that confidence is lacking, the rate of 
spending must increase. 

There is, of course, nothing novel in this general thesis, but Governmental 
technique does not always indicate an awareness of it. Indeed, as the author 
points out, the actions of public spirited people with the best intentions often 
contribute to the wrong psychological attitude being produced. For the 
greater the eminence of the administrator warning against inflation, the more 
reason the public has for believing in its imminence. Also, the requirements 
of sensational journalism tend to produce the wrong impression. If the Presi- 
dent announces that measures are to be taken to avoid a potential increase 
in the cost of living of 25 per cent, the resulting headline will probably be 
“25 per cent increase in living costs feared.” The way to avoid all this com- 
pletely is as obvious as it is sinister; but a substantial contribution could be 
made if the press would sacrifice its headlines, which it could without sacri- 
ficing its freedom. 

The book is full of interesting suggestions but not conclusions as to the 
best psychological approach. For instance, in organizing a voluntary saving 
campaign, is it more effective to dramatize the need for sacrifice or to say 
that it is more sensible to save now in order to secure an automobile or a ra- 
dio in 1947? The author makes the important point that a tax program 
should be suitable to impress the public that the Government is determined 
to fight inflation. With the tax prospects of an election year, this may be the 
main contribution that the 1944 taxes will make. 

While this book hardly deserves to be required reading for administrators, 
it should stand high on the “recommended” list. 

A. SMITHIES 


Washington, D.C. 
367 















368 AMERICAN STATISTICAL ASSOCIATION: 


Outlay and Income in the United States, 1921-1938, by Harold Barger. New 
York: National Bureau of Economic Research. 1942. xxvii, 391 pp. $2.50. 


It is a truism that the value of the national product during a given period 
may be determined by summating either the values of the services rendered 
in production, or the values of the goods and services constituting the final 
product. The standard estimates of the national product of the United States 
are estimates of the income received by the agents employed in its produc- 
tion. Mr. Barger is not the pioneer in estimating independently the aggre- 
gate outlay for the final product. Lough and Warburton, to mention no 
others, have preceded him. But Barger presents the first essentially inde- 
pendent estimate of outlay which covers a series of consecutive years. It is 
therefore possible for the first time to compare income and outlay estimates 
as series, and national income literature is the richer. The purpose of the 
volume, the author states, is to contribute to materials for the study of the 
business cycle. Comparison of outlay and income estimates, and analysis of 
relations between components, may, however, be as useful to other students 
of the national income as to cycle analysts. 

In addition to annual and quarterly estimates in current dollars of income 
and outlay for the period 1921-1938, the volume presents a stimulating dis- 
cussion of concepts, useful running commentary on methodology, and specu- 
lation both qualitative and statistical concerning sources and margins of 
error in the estimates. Throughout, the study evidences the work of a fertile 
mind. 

The annual data for income, and with one bold exception the income con- 
cepts, are Kuznets’. Kuznets’ interest is basically in the social product as 
such; Barger’s is avowedly in variation over the cycle. To the reviewer, it 
seems that only failure to adjust the data to the change in purpose can ac- 
count for retaining Kuznets’ treatment of depreciation and depletion, in- 
ventories, and capital gains and losses. Since in each case the alternative 
figures are presented in subsidiary tables, the point is perhaps not important. 
But Barger is certainly not correct in asserting that, given consistency, the 
concepts adopted are immaterial. On the controversial matter of the product 
of government, Mr. Barger “out-Herods Herod” by assuming that all gov- 
ernment product covered by taxes of any kind is intermediate product, and 
deficit expenditures alone measure the value of the final product of govern- 
ment. As a measure of total product in the system, this treatment seems in- 
defensible, even as a choice among available alternatives. But considering 
the purpose of the volume, there may be a better defense for Barger’s treat- 
ment than any defense which he presents. Any study of causal forces in 
cyclical fluctuations must in one way or another relate movements in output 
or activity totals to changes in the level of savings and investment. But the 
total to be so related is the total of output net rather than gross of tax-covered 
public product. For, to a first approximation, the propensity to save of gov- 
ernment is zero, and public tax-supported activity is therefore not a part of 
the total which is uniquely related to the current level and rate of change of 




















-Booxk REvIEWws 369 


savings. Public and private activity are not alternatives. Rather, public tax- 
supported activity is superimposed upon the level of private activity. 

Not the least attractive and stimulating feature of the volume is the frank- 
ness and clarity with which Mr. Barger discusses the shortcomings of the 
data. The statistical analysis of probable margins of error is also valuable 
though it is a little startling to see the outlay and income data for a given 
year treated as though they were a random sample of two drawn from an in- 
finite population of estimates of the national product. 

The statistical analysis might well have been carried further, as Richard 
Stone has pointed out in a review article. As Stone shows, the residual dis- 
crepancies between the outlay and income estimates, after a constant dif- 
ference has been subtracted, are significantly correlated with change from 
the preceding year in physical volume of inventory, with profits from inven- 
tory revaluation, and (perhaps not significantly) with time. Here are prom- 
ising clues for further investigation. 

Stone points out, too, that Barger is hasty in concluding that the two se- 
ries cannot be brought together into one. They cannot be reconciled in the 
usual sense, it is true. But a weighted average may be secured which pre- 
sumably would be more valid than either of the series averaged. It “would be 
possible on the lines suggested by [Stone] and others to assess measures of 
precision for the component series which by suitable combination would yield 
approximate weights for the two measures of the national product, and also 
weights for distributing the adjustment in each series over its component 
items.”? The single series so derived might be highly useful in further statisti- 
cal analysis of cyclical behavior. 

Everett E. HaGen 

Washington, D. C. 


International Trade and the National Income Multiplier, by Fritz Machlup. 
Philadelphia: The Blakiston Company. 1943. xvi, 237 pp. $3.50. 


In recent years it has become common to meet in economic discussions the 
formidable terminology of the multiplier, to the acute discomfort of the un- 
initiated. Professor Machlup has attempted to repair the deficiency of ele- 
mentary expositions of this subject, and if any conscientious reader com- 
pletes this book without a fair notion of what is involved in the multiplier 
concept, it will not be because of any lack of pedagogical zeal on the part of 
the author. The subject is developed in painstaking, and at times almost 
painful, detail. Although the discussion concentrates upon the relationships 
between countries, the theory of the domestic multiplier is developed by im- 
plication, so that the book will also appeal to those who have no special 
interest in international trade as such. 

By avoiding the use of all but the simplest algebraic symbolism, the author 

1 “Two Studies on Income and Expenditure in the United States,” Economic Journal, April, 1943 


(Vol. LIII), pp. 60-75. 
3? Loc. cit., p. 74. 



























AMERICAN STATISTICAL ASSOCIATION: 





370 
has succeeded in compressing into a book of some 200 pages what could other- 
wise be presented in a single mathematical article of some complexity and 
limited appeal. Numerous arithmetical model sequences are presented under 
a variety of assumptions, involving variations in numerical coefficients and 
in the initial disturbances assumed. These models will tax the energy of the 
reader, but if the essential principles of the first one are understocd, then so 
should be the other thirty or so, including the “super-duper” on page 100. 

There are nine chapters and two mathematical appendices. Chapters I, 
II, VIII, and X are not very technical in nature. In these, Professor Mach- 
lup discusses the background of the multiplier models, assumptions, qualifi- 
cations, policy implications, etc. The reviewer was not satisfied at every 
point with the author’s analysis: e.g. the monetary explanation of the work- 
ings of the multiplier; the discussion of the Robertsonian period definitions 
of saving and investment; the time length of the interval between successive 
“rounds”; the belief that multiplier models must assume continuing streams 
of autonomous change; the brief discussion of capital flight; the largely ter- 
minological difficulties conjured up in the discussion as to whether the trade 
balance causes foreign investment or vice versa; the statistical and concep- 
tual problems in empirical verification of the multiplier hypothesis. Never- 
theless, there is so much that is excellent in these pages, and the subject 
matter is so important, that no student of the cycle can neglect to read and 
reread them. 

The remaining technical chapters constitute the heart of the book, and al- 
though more complex, they nevertheless make simpler reading. With many 
variations, three main cases are studied: (1) the effect of a shift of demand 
from domestic to foreign products, upon the home country and the foreign 
country; (2) the effect of increased investment in one country upon both 
countries; and (3) the effect of simultaneous (opposite) changes of invest- 
ment in both countries upon each country. The first is worked out in greater 
detail, for more than two countries, with and without savings, for many com- 
binations of coefficients, in terms of the transient response of the system as 
well as the limiting final state, with algebraic analysis and elementary dif- 
ference equation theory. Most of this is embodied in the masterly Chapter V 
and its two appendices. The reviewer could have wished, however, that the 
author had explicity used the fact that cases (3) and (1) are really identical 
except for bracketing of terms. Actually, case (2) is the most important of 
all, not only empirically but also because the others can be derived from it 
by simple and natural superposition. By concentrating upon it, the length 
of the book could have been halved leaving room for further discussion of 
price and interest changes, imperialism, and other policy problems. Per- 
haps, too, the book would have been more useful if the author had expounded 
in greater detail the contributions of Metzler, Salant, and Laursen. However, 
as it is, the book is an important contribution which deserves to be in the 
library of all modern economists. 

Pau A. SAMUELSON 


Massachusetts Institute of Technology 














er = "Fs oe FF 


= 








371 





-Book REVIEWS 


Economic Equilibrium, Employment and Natural Resources, by L. R. Nien- 
staedt. Bloomington, Indiana: The Principia Press, Inc. 1942. xvi, 412 pp. 
$4.00. 


This book will appear strange to many economists in that much of the 
content deviates from that found in traditional analyses of economic prob- 
lems. An attempt is made to develop a reasonably complete statement of mac- 
rocosmic balance in terms of factors involved in physical production. Bal- 
ance is conceived as an adjustment between production and consumption 
and derived by relationship to agricultural output in a Physiocratic frame- 
work. The relationship to agricultural output is conceived to vary with the 
stage of industrialization. The system is developed on three major relation- 
ships: one—the agricultural harvest to other economic production; two— 
human and machine capacity to production; three—physical factors to the 
flow of money values in producing national income. Nienstaedt denies the 
validity of market-price adjustment. He sees the weakness of microcosmic 
balance in an economy which varies as widely as ours has in recent years. 

A series of relationships is developed and checked against statistical in- 
formation relating to the United States’ economy, principally for the period 
from 1880 to 1929. The checking of this system against the statistical data 
of the United States is too involved to be summarized here. One critical 
measurement may be noted because of its significance in Nienstaedt’s denial 
of the validity of market-price adjustment. The ratio of currency in circu- 
lation to national income is found to be fairly stable within given periods. 
Stated in the simplest terms, national income is derived from production on 
the one hand and related to the flow of money values by a constant pro- 
portion to currency in circulation on the other, so that market-price adjust- 
ment is held to have little determining influence. To achieve this strange 
conclusion it is necessary to ignore the influence of bank deposits. 

Nienstaedt holds that equilibrium developed in terms of physical produc- 
tion makes possible the complete integration of the business cycle with the 
system of equilibrium itself. He thinks of the business cycle as being made 
up of random causes of the Slutsky effect, time lags as suggested by Rosen- 
stein-Rodan, and the flow of velocities at individual production stages as 
suggested by Amorosa. 

One of the conclusions reached is that balance under present conditions 
can be achieved only where a considerable part of agricultural production 
enters the market in free exchange with all other products. This leads to a 
suggestion of “Industrial Village Estates” as a method of stabilizing em- 
ployment. Of major importance in his system is the balance of capacities. 
Therefore, another method of stabilization suggested is the registration of 
technical productive capacities for which certificates may be issued with 
the requirement that such certificates be obtained before connecting pro- 
ductive machinery. 

The book evidences a good deal of work and, although the reviewer is un- 
able to accept most of the material in its present form, he suggests that the 



















372 AMERICAN STATISTICAL ASSOCIATION® 


study is worthy of the attention of those interested in the development of 
ideas of macrocosmic balance. 
EuMer C. Bratt 


Lehigh University 


The Shifting and Incidence of Taxation, by Otto von Mering. Philadelphia: 
The Blakiston Company, 1942. xiii, 262 pp. $3.25. 


This book presents a closely reasoned analysis of shifting, incidence, and 
effects of taxation, “largely built upon [the author’s] . . . earlier work, Steu- 
eriiberwaélzung” (1928). For most of the problems, its approach is traditional 
in the use of partial equilibrium supply-and-demand mechanics, but modern 
in the application of recently developed geometrical techniques, especially 
Joan Robinson’s. Some of the proofs in Part I, “The General Theory of Tax 
Shifting,” and the proofs in the appendix are developed by mathematical 
analysis. 

Much of the contents will probably be new to the general student of pub- 
lic finance, for Professor von Mering covers an impressively wide range of 
problems, utilizing analyses gathered from journals and specialized works 
both here and abroad. The material is presented not as a history of doctrine 
but as a fairly well integrated series of problems, the author selecting in each 
case the source that he evidently considers the best for that topic. The 
twenty pages on tax shifting under a monopoly, for instance, draw succes- 
sively on B6hm-Bawerk, Menger, Conigliani, Joan Robinson, Cournot, Edge- 
worth, Gilbert, Fagan and Jastram, Seligman, H. G. Brown, Dalton, Garver, 
Stamp, A. L. Meyers, Wicksell, and Hotelling; and include an original treat- 
ment by von Mering himself of the difference between the effects of a per 
unit tax and a gross receipts tax set at a rate to equal the per unit tax (at the 
price prevailing before tax). 

The attempt to cover so many difficult problems in so compressed a space 
makes real understanding of some of the analysis difficult, in some places 
probably impossible. Part of the difficulty can be traced to a lack of ex- 
plicitness in the assumptions underlying the analysis; part, to the use of 
specialized concepts (like “the investing period,” pp. 122-24) that the author 
expects the reader to be familiar with already or to assimiliate by a study of 
the original sources. Aside from these two points, however, the book is re- 
markable for its clarity and directness of expression; the author wastes no 
words. 

Part II, “The Shifting of Particular Taxes,” is not quite up to the level 
of Part I. The reasoning seems to lapse occasionally from its generally high 
level of dependability (or perhaps it is merely that the assumptions are less 
explicit), especially in the tendency to emphasize the extensive margin but 
not the intensive (e.g. example on p. 166; and cf. Edgeworth, Volume II, p. 
83). It takes a good deal of caution and trouble to get safely through parts 




















-Book REVIEWS 373 


like pp. 184-88, where the “supply of labor” evidently refers to changes in the 
number of the working force, in the “elastic” case, and to changes in the per- 
centage of the working force that is employed, in the “inelastic” case (hence 
the meaning of the conclusion that the workers bear the tax is not clear), 
while the inelastic “demand for labor” seems to run in terms of money wages, 
in contrast to the real-wage analysis in the preceding cases. And in weigh- 
ing the conclusions reached by the author, it must be recalled that even in the 
analysis of widespread taxes like those on all retail sales or on wages in gen- 
eral, the fact that some disposition has to be made of the tax revenue is usu- 
ally excluded from the scope of the analysis. 

For those who have a special interest in the economic aspects of public 
finance the book will give a substantial reward, if it is absorbed slowly and 
carefully. 

Car. 8. SHoup 

Columbia University 


Federal Tax Course, by George T. Altman. Chicago: Commerce Clearing 
House. 1943. Loose-leaf. $10.00. 


Although intended more for study than as a reference work, this course 
has been so designed as to serve both purposes, and to provide readily the 
answers to everyday questions. The entire federal tax structure together 
with its functioning is adequately covered, but the book rightly lays the 
greater stress upon the Federal income tax. Accompanying the course, which 
consists of some 1400 or more pages in a well-indexed and tabbed loose-leaf 
binder, the publishers furnish a special book containing the income, estate, 
and gift tax provisions of the Internal Revenue Code as amended to date, 
and also a separate book of filled-in tax return forms. 

The course analyzes not only the statutes, but more significantly, Treasury 
regulations and Supreme Court decisions as they modify the basic law. The 
course is divided into four main parts: (1) explanatory text; (2) problems 
and solutions; (3) specimen income-tax returns; and (4) official texts of the 
income tax and excess profits tax regulations. The closely packed 11-page 
glossary of tax terms, the topical index, and the table of cases cited and dis- 
cussed in the explanatory text round out the day-to-day usefulness of this 
book. 

The explanatory text traces the Federal revenue system back to the early 
colonies. It shows how the tariff as a source of revenue was first supple- 
mented, and later supplanted by, the internal revenue, which now leans 
heavily upon income and profits taxes, but with no inconsiderable support 
from excises, gift and estate taxes, employment taxes, and other miscellane- 
ous internal revenues, many of which started as “temporary” taxes and 
finally became an integral part of our tax structure. 

The text then analyzes the Internal Revenue Code and “other sources of 
the law,” meaning tax laws not incorporated in the Code, Treasury Depart- 











374 AMERICAN STATISTICAL ASSOCIATION: 


ment regulations and rulings, decisions of the Tax Court of the United States 
(formerly known as the United States Board of Tax Appeals), and decisions 
of other courts. Also included here is a short section on the power of the states 
to levy income taxes. 

The main part of the book concerns itself with the federal income tax as 
it applies to individuals and corporations. Some of the topics dealt with, to 
mention but a few, are: classes of taxpayers, taxable incomes, rates and ex- 
emptions, business expenses, depreciation, depletion, amortization, interest, 
taxes, medical expenses, and so forth. The remainder of the explanatory 
text covers excess profits and capital stock taxes, excises, sales and process- 
ing taxes, taxes on facilities and admissions, social security taxes, and finally 
estate and gift taxes. 

The succeeding section of the course presents problems and questions il- 
lustrating the salient features of the tax laws. These problems are classified 
in the same manner as the explanatory text, and are appropriately keyed in 
with it. Specimen filled-in returns for individuals, partnerships, corpora- 
tions, estates and trusts, and excess profits, plus sample worksheets and ex- 
planatory instructions, increase the practical value of this volume and its 
accompanying materials. 

The broad coverage of the course and its ready application extend its 
usefulness beyond the student to the lawyer, accountant, and other profes- 
sional worker, as well as to those in government and business. 

James D. Paris 

Great Neck, New York 


Inter-American Staiistical Yearbook, 1942. Edited by Raul C. Migone, with 
the assistance of Marcelo Aberastury, Emilio Fuente, and Jorge E. 
Iturraspe, under the auspices of the Comision Argentina De Altos Es- 
tudios Internaciales. New York: The Macmillan Company; Buenos Aires: 
El Ateneo; Rio de Janeiro: Freitas Bastas & Cia., 1942. 1066 pp. $10.00. 


The Inter-American Statistical Yearbook, 1942, represents the second edi- 
tion of the publication, the first having appeared in 1940. As in the case of 
the first edition it was prepared under the Argentine Commission of High 
International Studies and edited by Dr. Raul C. Migone. 

A distinctive feature of the book is that the data in the tables are presented 
in the four principal languages spoken in the Western Hemisphere, in order 
of their importance: Spanish, English, Portuguese, and French. This makes 
the information readily available for reference by any of the four lingual 
groups, and should, in time, serve as a stimulus toward an understanding and 
appreciation of the other languages with which the majority of readers are 
not so familiar as they are with their own. 

The major arrangement of the book is by subjects, rather than by coun- 
tries. This affords a ready comparison of the importance of each area in terms 











es 
ns 
es 


— es 1 


— 








-Boox REVIEWS 375 


of a given subject. The principal subject headings are as follows: Population, 
production, industries, commerce (foreign), social questions, currencies, 
banks, and investments, public finance, public education, public health, in- 
ternational cooperation, and sources. Although most of these subject head- 
ings are self-explanatory, there are two which deserve special mention— 
social questions and international cooperation. 

The data presented under social questions are especially valuable in mak- 
ing comparisons of living standards, and of reaching an understanding of 
how each country utilizes its natural and human resources. These data in- 
clude information regarding the gainfully employed population, wages and 
hours, unemployment, strikes, cost of living, retail prices, autosufficiency of 
foodstuffs, average per capita consumption of the principal foodstuffs, and 
distribution of expenditures for the more important groups of items in the 
cost of living. 

The section on international cooperation presents in tabular form a com- 
plete record of the formal relationships of each country with others, under 
the following headings: Participation in international or continental organi- 
zations, agreements and conventions concluded under the auspices of the 
League of Nations, Pan-American treaties and conventions, international 
labor conventions, and bilateral trade agreements. It is unusual to find infor- 
mation of this type arranged in a form so convenient for reference. 

A section in the preface is devoted to conversion coefficients for weights 
and measures used in the different countries, and includes a tabulation of the 
approximate exchange values of the various currencies. This feature adds 
much to the value of the book in that the reader does not require additional 
references to interpret quantities and values. 

Myron E. ANDREWS 


Washington, D. C. 


World Minerals and World Peace, by C. K. Leith, J. W. Furness, and Cleona 
Lewis. Washington: The Brookings Institution. 1942. xii, 253 pp. $2.50. 


This study was suggested by Dr. C. K. Leith, geologist of the University 
of Wisconsin, whose interest in the relation of minerals to peace dates from 
World War I, when he was mineral adviser to the War Industries Board and 
to the American Commission to Negotiate Peace. Dr. Cleona Lewis of the 
Brookings Institution staff, who collaborated with the other authors, Dr. 
Leith and J. W. Furness (formerly chief of the Economics and Statistics 
Branch of the Bureau of Mines) has written several volumes on international 
economic relations. The result of this joint effort is a volume that merits the 
attention of all those interested in raw materials as a factor in world peace. 

A substantial background of factual information on trends in mineral pro- 
duction and trade is presented in Part I of the volume. Part II discusses the 
recent trends in controls through monopolies, cartels, nationalization, and 








376 AMERICAN STATISTICAL ASSOCIATION: 


other political and economic policies. The third and final part delves into a 
necessarily less objective consideration of future access to raw materials and 
mineral control in relation to peace. 

The year 1938 is chosen for most of the recent international comparisons of 
mineral output. Although that year was the last full calendar year before 
the outbreak of the major conflict in Europe, the selection is open to criti- 
cism on the ground that the sharp recession of industrial activity in the 
United States that year makes a comparison with other countries less favor- 
able than it would have been for a more nearly normal year, such as 1936 or 
1937. However, data presented in Chapter I and Appendix A for certain min- 
erals over a period of years enable an evaluation of the effect of this difference 
in cyclical position. 

Among the more interesting conclusions are that the so-called “have not” 
nations really did not possess much of the world’s mineral production, and 
efforts to develop domestic supplies met with little success. The proportion 
of total mineral output controlled by the Axis powers is placed at 11 per cent 
in 1939 and, as a result of conquest, at 25 per cent at the end of 1942. It is 
pointed out that the “have nots” were not denied access to minerals. On the 
contrary, they benefited from keen competition for the sale of minerals in 
world markets. 

The final chapter, written by Dr. Leith, develops the thesis that the use of 
minerals should be restricted to peaceful pursuits. It is conceded that those 
who elect to prevent the would-be aggressor must possess most of the min- 
erals. The reviewer confesses that he is left uncertain as to how the follow-up 
control would be accomplished and still retain the freedom acclaimed by the 
Atlantic Charter. 

Perhaps if the nations in control of major mineral resources had helped the 
German democracy prosper as readily as they made economic concessions 
to the might of the Nazis almost from the start, there would not have devel- 
oped the occasion for the bogus claims that raw materials could not be ob- 
tained at the very time that they were being hoarded hand-over-fist in prep- 
aration for war. 

WILBERT G. FRITZ 

War Production Board 


Principles of Marketing, by Fred E. Clark and Carrie Patton Clark. New 
York: The Macmillan Company. 1942. xx, 828 pp. $4.50. 


This is the latest—and perhaps the last—general textbook on marketing 
to be published during the war. It is not that popular interest has waned in 
the field of distribution but that the exigency of war has shifted emphasis to 
subjects of industrial production and away from marketing topics. After the 
war, there will be disproportionment in trained marketing men as there was 
in trained production men when we entered the war. If experience at the 
close of the last war follows this war’s end, there again will be “boom days” 








a — 

















-Boox REVIEWS 377 


in formal classroom instruction at collegiate schools of business administra- 
tion. 

It is fortunate, then, that Dr. Clark has revised his text on marketing. 
This third edition appears twenty years after his first edition; it aptly sum- 
marizes events of the years in between and touches on war-time economy in 
marketing only incidentally. Since war-time controls are not likely to be ex- 
ercised in a peace economy, the experiences of current marketing operations 
may be left to others to view in a proper historical perspective. 

Any student of marketing welcomes Dr. Clark’s annotated text. There 
are footnotes on nearly every page and a bibliography at the close of each 
chapter. To the casual reader this presentation may deter his speed in read- 
ing, but the references serve well to correlate developments in a rapidly 
changing field which Dr. Clark has ably documented in the period of peace 
between wars. 

To those who are acquainted with the early text, it should be added that 
the original character of the book has not been changed; the form of treat- 
ment is functional. Each chapter has been thoroughly revised with additional 
material, charts and illustrations. In particular the chapters on retailing have 
been expanded with special reference to voluntary and cooperative chains 
and to supermarkets. 

The chapters on manufacturers’ problems include new material on the 
marketing of business goods and selective selling. The treatment on manu- 
facturer-middleman relations has been clarified. One chapter has been de- 
voted to the nature of the consumer movement and the various agencies and 
activities connected with it. Controls and legislation affecting marketing, 
now so extensive and far-reaching, have been expanded into two chapters. 
Dr. Clark’s text always has been well-oriented on agricultural marketing 
subjects. The book is divided into twenty-eight chapters together with an 
appendix on marketing reports which are suggested for classroom problems. 

Howarp T. Hovpe 

Wharton School of Finance and Commerce 

University of Pennsylvania 


The Road We Are Traveling, by Stuart Chase. New York: The Twentieth 
Century Fund. 1942. 106 pp. $1.00. 


In the preface, Stuart Chase writes: 


The United States may come out of this war the strongest nation on earth. 
To be worthy of that strength, we should take the lead in plans for perma- 
nent co-operation among our allies. .. . Our task will begin, even if it does 
not end, at home. 


And then he gives us the first of six exploratory reports on post-war prob- 
lems under the title The Road We Are Traveling: 1914-1942. It is likely that 
when all six reports have been completed, the Twentieth Century Fund will 


















AMERICAN STATISTICAL ASSOCIATION: 





378 


publish them in a single volume called When the War Ends. For just as the 
course of the war will do much to shape the peace, so intelligent thinking 
about the peace may affect the course of the war. History, as has been said 
before, is a seamless process, part idea, part event, with the two interacting 
at all times. 

Since this is the introductory volume to Mr. Chase’s projected series, its 
material is fairly familiar. The first half of the book is a sprightly recapitu- 
lation of the events of 1914-1942, a period in which governments every- 
where discovered the conveniences of deficit financing and the necessity of 
giving public employment or subsidies to those who could no longer find work 
in private industry. One may have misgivings about the whole 1914-1942 
pattern, but personal prejudice does not banish the realities which Mr. 
Chase describes. 

The second part of Mr. Chase’s book consists of an inventory of basic 
trends. These are divided into causes and effects, the causes being technolog- 
ical advance, the flattening population curve and the closing frontier. The 
effects he mentions are interdependence, unemployment, the decline of in- 
vestment opportunity, shrinkage of the free market, high-pressure selling, 
mechanized warfare, government in business, and a trend toward national 
self-sufficiency. 

Mr. Chase offers pretty clear going in most of the book but I keep won- 
dering whether his cause-and-effect sequence is entirely adequate. Why 
should we assume, for example, that technological advance must lead to a 
decline of investment opportunity? And why should we think that flattening 
population curves and closed frontiers are here to stay? If government 
action can prevent the stiffening of prices and the limited use of patents, 
technical innovations should actually compound the opportunities for in- 
vestment. 

Then too, this war will have a tremendous impact on our attitude toward 
family sizes; in the future no sovereign nation will dare to let itself be ma- 
neuvered into the position of France, where the old began to outnumber the 
young. With the teeming Orient only a step away, Australia will certainly 
not be satisfied with a population of 7,000,000. Russia will seek millions of 
new people to exploit Siberia, which probably will remain beyond effective 
bombing range from Western Europe for years to come. 

Actually, this war should result in a dozen new frontiers. The road to 
Alaska, if properly exploited as a substitute for such palliatives as WPA and 
the British dole, should be the equivalent of a minor Louisiana Purchase. 
For the North Country, as Stefansson persists in pointing out, can support 
people who are willing to work within its limitations. Our own Pacific North- 
west with its unused water power is a potential frontier vast enough to ab- 
sorb every complaining Okie. 

Mr. Chase errs, I think, in assuming that State interference in the econom- 
ic process must point to more government investment rather than less. We 
have always had State interference and State subsidies: the Homestead 




















379 





-Boox REVIEWS 


Act was a gigantic land subsidy, and the building of the American road sys- 
tem was a free gift to that rugged individualist, Mr. Henry Ford. Why can’t 
government interference and subsidization of the future be of a type that 
will work to free the individual enterpriser as they have in the past? The 
Farm Security Administration points the way in a modest fashion. 

Too often our thinking lags behind our opportunities, and our courage 
cannot even match our thinking. We monkey around with trying to maintain 
the status quo in cotton and wheat; but if we rewarded farmers for diver- 
sifying their crops instead of giving them money to add to cotton and wheat 
carry-overs, we might discover shortly that freedom from the necessity of 
farm subsidies is just around the corner. Has any farm family yet had its fill 
of tomato juice? At any rate, it is nonsense to suppose that wheat should 
eternally bring a dollar when every year sees money shaved from the cost of 
industrial products. Instead of giving the farmer money for taking fields out 
of use, why not try to give him a cheaper tractor, a more efficient pump? 

Inasmuch as he has five more volumes to write, Mr. Chase probably will 
canvass all possibilities. He may indicate later that he does not consider a 
vast increase in government investment to be the only post-war alternative 
before us. Government investment may be one “way-out,” but, as Hilaire 
Belloc says, when the “determining number” of voters depends on the State 
for livelihood, freedom usually disappears. When political power grows too 
strong at the expense of social power, democracy usually dies. 

Forrest H. KirKPaTRICK 

Bethany College 


Harvard Cooperative Society, Past and Present, by N.S. B. Gras. Cambridge: 
Harvard University Press. 1942. xi, 191 pp. $1.50. 


Professor Gras has written a fascinating little book detailing the history 
of a cooperative enterprise in the Harvard environment. In sprightly 
style, he tells how this organization developed in sixty years a membership 
equity of $800,000, annual sales of $1,400,000, a total membership of 14,000, 
and a current dividend rate on members’ purchases of 12 per cent. The ac- 
count is sympathetic, yet well balanced, and does not suffer from lapses into 
uncritical eulogy. 

The significance of the volume goes far beyond the confines of the Cam- 
bridge chain of enterprises. As Professor Gras suggests, it is a study of 
“Twenty years of cooperation and forty years of success” in following the 
best traditions of private business. But it is also a case study in the encroach- 
ment of private business ideology on a pattern of consumers cooperation— 
an encroachment which has similarly affected thousands of other coopera- 
tive enterprises, ranging from mutual insurance companies, through coopera- 
tive banks to farmers’ telephone companies. In a capitalistic milieu, the dem- 
ocratic idealism of cooperative pioneers has frequently faded from sight. 





































380 AMERICAN STATISTICAL ASSOCIATION: 


Compromises and adjustments have been made with surrounding institu- 
tions. The result has often been the development of an organization which 
is neither flesh nor fowl. It is no longer cooperation—perhaps it might be 
termed a mutual company. 

Put briefly, what happened to the Harvard cooperative was this: the So- 
ciety was launched by students on a fully democratic basis to reduce the 
cost of living in the Harvard area. It early faced problems of apathy, bad 
management, and boycotts. For a period, Professor Ames and Professor 
Taussig, and Harvard itself, gave significant assistance. Substantial growth 
ensued during the eighteen nineties, accompanied by increasing problems of 
accounting and control. An apathetic membership was faced with the ques- 
tion: “Should the original democratic way of life prevail or a new aristocrat- 
ic oligarchy be adopted?” 

The oligarchy won and in 1902 the Society super-imposed upon the orig- 
inal organization a complex scheme of voting stockholders in a newly- 
formed corporation. These voting stockholders, as trustees, were to elect the 
board of directors, although a sufficient number of membership votes might 
override the voting stockholders. 

Today, the re-assumption of control by the “ticket-holders” is “possible 
but not practical.” The Society is not affiliated with the Cooperative League 
of the U.S.A. Its reform enthusiasm is dead. Its efficient manager “keeps 
abreast of the time, partly by attending the meetings of business men who 
are wrestling with problems similar to his own.” He has secured considerable 
assistance from the Harvard School of Business Administraticn and the Na- 
tional Retail Dry Goods Association in improving budgetary practice. “The 
history of the society since 1903 . . . points to success in spite of, or because 
of a lack of sympathy with, the cooperative movement as generally propa- 
gated. One notable example of this has been the lack of any emphasis upon 
education in cooperation. The Harvard emphasis has been upon private busi- 
ness, notably in its School of Business.” 

Mr. Gras seems content to place the Harvard Cooperative Society along- 
side the Ford Motor Company and R.H. Macy and Company as illustrations 
of financial-industrial capitalism, possessed of managerial efficiency and free 
from restraining banking influence. He is a bit fearful of stagnation in the 
organization, of over-emphasis on profitable lines, of advancing costs, and 
of the failure to enlist the enthusiastic support and participation by students 
and teachers at Harvard and Tech. He is not pleased by the complicated 
constitution which “is a model in obscurity.” Yet, on balance, he finds the 
Society “a vital going organization” with “a life, a standing, a reputation, 
and a function to perform”—the result of conscientious effort of the many 
distinguished men who have served it well over its history. 

To the reviewer, this appraisal appears too favorable. A deeper significance 
appears in the fact that more than 8,000 members are completely disenfran- 
chised and that but a dozen of the 7,000 full-fledged members appear for 
ticket-holder meetings. The ten stockholding trustees have the controlling 


























-Book REVIEWS 8381 


power and have no thought of propaganda or education in cooperation. The 
Society has achieved acknowledged business success; it has not contributed 
an object lesson in democracy. Perhaps the latter would be too much to ex- 
pect in the existing setting. 
CoL_ston WARNE 
Amherst College 


Jewish Population Studies. Edited by Sophia M. Robison. New York: Con- 
ference on Jewish Relations, 1943. xvi, 189 pp. $3.50. 


Since the United States Census does not inquire into the religious affilia- 
tion of individuals, and the Census of Religious Bodies also does not comprise 
an actual enumeration of the numbers of any sect but a compilation from sec- 
ondary sources, the Jewish Population Studies, initiated by the Conference 
on Jewish Relations, are a new and important source of information. The 
studies were planned “on the principle that the communal work of Jewish 
organizations should be based on facts ascertained by scientific procedure.” 
Demographic analyses of 10 communities (by different authors) are included 
in the present volume: Trenton (Sophia M. Robison), Passaic (same author), 
Buffalo (Uriah Z. Engelman), Norwich and New London (Bessie B. Wessel), 
Pittsburgh (Maurice Taylor), Detroit (Henry J. Meyer), Chicago (A. J. 
Jaffe), Minneapolis (Sophia M. Robison), San Francisco (Samuel Moment). 
However, as the editor points out in an introductory chapter on methods ap- 
plied to gather data on the Jewish population, the results obtained in these 
cities are likewise based more or less on estimates or sampling procedures 
“because the arguments not to include the question of religious affiliation in 
the census blank outweigh those in its favor.” 

Most of the uncertainties in such studies arise from the difficulty of defin- 
ing the designation “Jew.” In the Chicago study a Jew is one whose death 
certificate indicates that he was buried by an undertaking firm whose busi- 
ness it is to conduct Jewish burials. In the Detroit study a Jew is one who has 
a Jewish name. In the other reports Jews are defined as persons born of Jew- 
ish parents or of mixed marriages, who are members of Jewish communal and 
religious organizations, and as those who when interviewed were willing to be 
identified as Jewish by “race” or religion. Other more methodical difficulties 
come in by using the estimated birth and death rates or the absence ratio of 
school children on the high Jewish holiday, Day of Atonement (“Yom Kip- 
pur method”), for estimating the total Jewish population on various assump- 
tions. The Buffalo survey only approached a complete census by a house-to- 
house canvass in those districts known to be densely populated by Jews; in 
other districts, however, and other places a sample canvass had to be em- 
ployed. 

Despite the differences and deficiencies in method, the results obtained 
are noteworthy as a good start toward compiling comparable characteris- 






































382 AMERICAN STATISTICAL ASSOCIATION: 


tics for the Jewish population in the United States as a whole. The results, 
for instance, show that the pattern of age distribution, sex ratio, size of fam- 
ily, length of residence of the foreign-born was essentially the same in all the 
communities studied. Numerous tables give detailed information about these 
items. When compared with the total white population, the Jewish group in 
each city showed a smaller ratio of children of elementary school age, a more 
even distribution among the sexes and a higher ratio of naturalized citizens 
among the foreign-born. 

The most striking characteristic in all ten communities lies in the occupa- 
tional distribution of the Jews, especially in their concentration in trade and 
the professions, well-known from findings in other countries. In no city, 
states the editor in her summarizing conclusions, was the ratio of gainfully 
occupied Jews in trade less than 43 per cent (Passaic), whereas in Pittsburgh 
it was as much as 60 per cent. In no one of the cities were there less than 8.5 
per cent of all gainfully occupied Jews engaged in professional careers. In 
two of the cities, Pittsburgh and Norwich, Jews in the professions accounted 
for more than 9 per cent, and in Trenton and Passaic this ratio went up to 
12.3 per cent. Especially high within the professional group was the concen- 
tration of Jewish doctors and lawyers. For instance, in Trenton with a Jew- 
ish population of 6 per cent, the Jewish physicians and surgeons comprised 
two-thirds of the 122 persons practicing medicine, while among the 190 law- 
yers and judges the Jews constituted 40.5 per cent. These results are very 
similar to those obtained in the last complete census of Germany (1933) re- 
garding the vocational distribution of the Jewish minority and reveal, indeed, 
a serious problem for a minority group. The careful studies in these American 
cities although based on estimating and sampling methods deserve the full 
attention of all circles concerned until a complete enumeration and evalua- 
tion by census methods for all religious sects make the results still more 
representative. 

GrorG WOLFF 


National Institute of Health 


An Introduction to Managerial Business Statistics, by Harry Pelle Hartke- 
meier. New York: Thomas Y. Crowell Company. 1943. 207 pp. $1.75. 


This book reflects the author’s extensive experience and firm conviction 
that the text for the first course in business statistics “should contain thor- 
ough and clear statements of a few topics” and should not attempt to cover 
the whole field in one volume. 

There are six chapters in the book: The Introduction outlines briefly what 
is meant by quantitative, qualitative, frequency-distribution, and time se- 
ries analysis, index numbers, and correlation and it relates them to problems 
of business management. Chapter 2 on “Quantitative Analysis” and Chapter 
3 on “The Theory Underlying Tests of Significance or Reliability” apply 
the theory of small samples and reliability to such practical problems as 























-Boox RrviIews 383 


choosing which brand of tires or paint to buy. Chapter 4 treats the subject 
of “Computing Procedure and Machines.” Chapter 5 on “Qualitative Anal- 
ysis” applies the tests of reliability to such problems as the percentages of 
customers preferring specific brands, obtained by sampling. Chapter 6 on 
“Production Control” applies the sampling process to the measurement of 
the variability of the physical characteristics of products made by a machine 
or a factory and describes how to set up control charts which show man- 
agement when these physical characteristics vary beyond the limits of toler- 
ance. 

Because these chapters are written so that the subject matter may be un- 
derstood easily by a student who has not had any college mathematics, the 
use and derivation of statistical formulae are kept at a minimum. At the end 
of each chapter many useful work tables and charts are to be found as well 
as sets of practical business problems. 

The pages, 84 by 11 inches in size, are punched for ring binders. These 
details have been arranged in order that the book may be taken apart easily 
so that individual chapters, tables, or instructions may be inserted in a 
binder along with other student notes or work sheets. 

Unlike many other introductory texts, it does not discuss in detail such 
topics as the planning of statistical investigations, collection of data from 
various sources by different methods, tabulation, ratios, charting, index 
numbers or the different measures of central tendency, dispersion and cor- 
relation. These methods are also useful in solving business problems. For ex- 
ample, the median and the mode—as well as other statistical measures— 
have their applications in solving business management problems. Apparent- 
ly, it is the author’s intention to treat index numbers and other topics in a 
series of separate volumes of which this is the first. The author’s criticism 
of advertising, page 26, seems to be irrelevant, unjudicial, and based on an 
extremely small sample. 

Within the limited range of the subject matter of this book, the author 
has succeeded in explaining the practical applications of some phases of sta- 
tistical theory which are usually very difficult for the layman to understand 
and apply. It should fit well into the curriculum of a business school which 
endeavors to develop the management point of view. Business executives 
who possess a good educational background should read this book because it 
indicates how these methods may be applied profitably to the solution of 
many of their management problems. 

Donatp R. G. Cowan 














Backman, Jules. Rationing and Price 
Control in Great Britain. Washing- 
ton: The Brookings Institution, Pam- 
phlet No. 50. 1943. iii, 68 pp. 50 
cents. 

Berry, Thomas Senior. Western Prices 
Before 1861. Cambridge, Massachu- 
setts: Harvard University Press. 
1943. xxi, 645 pp. $5.00. 

Butterbaugh, Grant Illion. The Meas- 
urement of Business Activity in the 
Puget Sound Area. Chicago: The 
University of Chicago Press, Studies 
in Business Administration, Vol. 
XIII, Number 2. 1943. v, 72 pp. 
$1.00. 

Chlepner, B. S. Belgian Banking and 
Banking Theory. Washington: The 
Brookings Institution. 1943. x, 224 
pp. $2.50. 

Crawford, Mary M. Student Folkways 
and Spending at Indiana University, 
1940-1941. New York: Columbia 
University Press. 1943. 271 pp. 
$3.50. 

Dempsey, Bernard W. Interest and 
Usury. Washington: American Coun- 
cil on Public Affairs. 19438. xii, 233 
pp. $3.00. 

Dublin, Louis I. A Family of Thirty 
Million. New York: Metropolitan 
Life Insurance Company. 1943. xvi, 
496 pp. 

Fisher, Ronald A., and Yates, Frank. 
Statistical Tables for Biological, 
Agricultural and Medical Research. 
London: Oliver and Boyd Ltd. 1943. 
vili, 98 pp. 13 shillings 6 pence 

Freeman, Frank N., and Wenger, 
M. A. The Chicago Mental Growth 
Battery. Chicago: The University of 
Chicago Press. 1943. v, 58 pp. $1.00. 

Hartkemeier, Harry Pelle. An Intro- 
duction to Managerial Business Sta- 
tistics. New York: Thomas Y. 


PUBLICATIONS RECEIVED* 


Crowell Company. 1943. xiv, 207 pp. 
$1.75. 

Heaney, Norman Stewart. Public 
Trusteeship. Baltimore: The Johns 
Hopkins Press. The Johns Hopkins 
University Studies in Historical and 
Political Science, Series LX, Num- 
ber 4. 1942. xiv, 130 pp. $1.50. 

Hoyt, Elizabeth E. Freedom from 
Want: A World Goal. New York: 
Public Affairs Committee, Inc., Pub- 
lic Affairs Pamphlets No. 80. 1943. 
31 pp. 10 cents. 

Kuczynski, R. R. The New Population 
Statistics. New York: The Macmil- 
lan Company. National Institute of 
Economic and Social Research, Oc- 
casional Papers, I. 31 pp. 50 cents. 

Labor Research Association. Labor and 
the War, Labor Fact Book 6. New 
York: International Publishers Com- 
pany, Inc. 208 pp. $2.00. 

Leith, C. K., Furness, J. W., and 
Lewis, Cleona. World Minerals and 
World Peace. Washington: The 
Brookings Institution. 1943. xii, 253 
pp. $2.50. 

Lyon, Stanley. Some Observations on 
Births in Dublin in the Years 1941 
and 1942. Dublin: Cahill & Co., Ltd. 
1943. 24 pp. 

Machlup, Fritz. International Trade 
and the National Income Multiplier. 
Philadelphia: The Blakiston Com- 
pany. 1943. xvi, 237 pp. $3.50. 

Mangus, A. R., and McNamara, Rob- 
ert L. Levels of Living and Population 
Movements in Rural Areas of Ohio, 
1930-1940. Wooster, Ohio: Ohio 
Agricultural Experiment Station, 
Bulletin 639. 1943. 61 pp. 

Mayo, Bernard, Editor. Annual Report 
of the American Historical Associa- 
tion, 1936, Vol. III, “Instructions to 
the British Ministers to the United 


* Annual reports and publications presenting statistics collected at regular intervals have been 
omitted from this list. Some items of minor interest to statisticians have also been omitted. The con- 
tents of periodical publications are not listed, but the attention of the reader is directed to the lists 
of articles in current publications which are to be found in the Revue de l'Institut International de Statis- 
tique, Journal of the Royal Statistical Society, American Economic Review, Population Index, Trans- 
actions of the Actuarial Society of America, The Record of the American Institute of Actuaries, and Sankhyd, 
the Indian Journal of Statistics.—Editor. 


384 

















a 




















- PUBLICATIONS RECEIVED 


States, 1791-1812.” Washington: 
United States Government Printing 
Office. 1941. xvi, 403 pp. $1.00. 

Migone, Raul C., Editor, assisted 
by Aberastury, Marcelo, Fuente, 
Emilio, and Iturraspe, Jorge E. un- 
der the auspices of the Comision 
Argentina De Altos Estudios Inter- 
nacionales. Inter-American Statisti- 
cal Yearbook, 1242. New York: The 
Macmillan Company; Buenos Aires: 
El Ateneo; Rio de Janeiro: Freitas 
Bastos & Cia. 1943. 1067 pp. $10.00. 

Motherwell, Hiram. Rebuilding Eu- 
rope—After Victory. New York: 
Public Affairs Committee, Inc., 
Public Affairs Pamphlets, No. 81. 
1943. 32 pp. 10 cents. 

Moulton, Harold G. The New Philoso- 
phy of Public Debt. Washington: The 
Brookings Institution. 1943. 93 pp. 
$1.00. 

National Bureau of Economic Re- 
search. Cost Behavior and Price Pol- 
icy. New York: National Bureau of 
Economic Research, Committee on 
Price Determination. 1943. xix, 356 
pp. $3.00. 

National Bureau of Economic Re- 
search. Income Size Distributions, 
Part II, Studies in Income and 
Wealth, Vol. V. New York: National 
Bureau of Economic Research, Con- 
ference on Research in Income and 
Wealth. 1943. xvii, 1681 pp. 

Reeve, Joseph E. Monetary Reform 
Movements—A Survey of Recent 
Plans and Panaceas. Washington: 
American Council on Public Affairs. 
1943. xiv, 404 pp. $3.25. 

Shoup, Carl, Friedman, Milton, and 
Mack, Ruth P. Tazing to Prevent 
Inflation. New York: Columbia Uni- 
versity Press. 1943. viii, 236 pp. 


2.75. 
Special Libraries Association. War 
Subject Headings for Information 


Files, Second Edition. New York: 
Special Libraries Association. 1943. 
69 pp. $2.00. 

Sturmthal, Adolf. The Tragedy of Eu- 
ropean Labor, 1918-1939. New York: 
Columbia University Press. 1943. 
xii, 389 pp. $3.50. 

Tax Institute. Wartime Problems of 








385 


State and Local Finance. Philadel- 
phia: Tax Institute, University of 
Pennsylvania. 1943. ix, 267 pp. 
$2.50. 

Taylor, Henry C., and Taylor, Anne 
Dewees. World Trade in Agricultural 
Products. New York: The Macmillan 
Company. 1943. xviii, 286 pp. $3.50. 

Tomasevich, Jozo. International Agree- 
ments on Conservation of Marine Re- 
sources. Stanford University: Food 
Research Institute. 1943. xi, 297 pp. 
$3.00. 

Truesdell, Leon E. The Canadian Born 
in the United States. New Haven: 
Yale University Press. 1943. xvi, 
263 pp. $3.00. 

The Twentieth Century Fund. Post- 
war planning in the United States, An 
Organization Directory. New York: 
The Twentieth Century Fund. 1943. 
xvi, 101 pp. $1.00. 

Walker, Helen M. Elementary Statisti- 
cal Methods. New York: Henry Holt 
and Company, Inc. 1943. xxv, 368 


pp. $2.75. 
Wilks, S. S. Mathematical Statistics. 
Princeton: Princeton University 


Press. 1943. xi, 284 pp. $3.75. 

Willis, J. Brooke. The Functions of the 
Commercial Banking System. New 
York: King’s Crown Press. 1943. 
viii, 225 pp. $3.00. 


Government Publications 
Chicago Plan Commission. Jndustrial 


and Commercial Background for 
Planning Chicago. Chicago: The 
Chicago Plan Commission. 1942. 


66 pp. 

Department of Commerce, Bureau of 
the Census. Service Establishments, 
Places of Amusement, Hotels, Tourist 
Courts and Tourist Camps, 1939, pre- 
pared under the supervision of 
Fred A. Gosnell. Sixteenth Census of 
the United States: 1940 Census of 
Business, Vol. II. Washington: U. 8. 
Government Printing Office. 1942. 
xvi, 637 pp. $2.00. 

Department of Commerce, Bureau of 
the Census. Construction: 1939, pre- 
pared under the supervision of 


Fred A. Gosnell, and John Albright. 
Sixteenth Census of the United 


386 


States: 1940, Census of Business: 
1939, Vol. IV. Washington: U. S. 
Government Printing Office. 1943. 
viii, 397 pp. $1.50. 

Department of Labor, Division of La- 
bor Standards. Controlling Absentee- 
ism, A Record of War Plant Experi- 
ence. Special Bulletin No. 12. 
Washington: U. 8S. Government 
Printing Office. 1943. 57 pp. 10 cents. 

International Labour Office. 7'he Inter- 
national Standardisation of Labour 
Statistics. Montreal: International 
Labour Office, Studies and Reports, 
Series N (Statistics) No. 25, Revi- 
sion of No. 19. 1943. vii, 169 pp. 
$1.00. 

League of Nations. Relief Deliveries and 
Relief Loans, 1919-1923. Princeton: 
Princeton University Press. 1943. 
62 pp. 

League of Nations. The Transition 
from War to Peace Economy. Prince- 
ton: Princeton University Press: 
Economic, Financial and Transit 
Department of the League of Na- 
tions. IT. A. 3. 1943. 118 pp. $1.00. 

National Resources Planning Board. 
Human Conservation, The Story of 


AMERICAN STATISTICAL ASSOCIATION: 


Our Wasted Resources. Washington: 
U. 8. Government Printing Office. 
1943. 126 pp. 20 cents. 

Social Security Board. Measurement of 
Variations in State Economic and 
Fiscal Capacity, by Paul Studenski., 
Washington: Social Security Board, 
Bureau of Research and Statistics, 
1943. 76 pp. 

Social Security Board. Statistics of 
Family Composition in Selected Areas 
of the United States, 1934-36, Vol. II, 
The Urban Sample (83 Cities in 18 
States). Washington: Social Security 
Board, Bureau of Research and Sta- 
tistics, Bureau Memorandum No. 
45. 1943. xxxvi, 620 pp. 

Union of South Africa. Annual Ac- 
counis, 1941-42 together with Finan- 
cial Statements Relating to Hospital 
Boards, Municipalities, Village Man- 
agement Boards, Local Boards and 
Divisional Councils for the Year 
Ended 31st December, 1941, with the 
Reports of the Provincial Auditor 
thereon. Pretoria: Government Print- 
er of the Union of South Africa. 
1943. 236 pp. 





